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mal " conditions. On the other hand, it is evident that 
this return to normal will not be violent, but gradual. 

The interpretation of statistical data at any time is 
full of difficulties; at the present time the difficulties 
are extreme. If the fluctuations of statistical series 
return to normal, will the transition take a few months 
or several years ? Is a tendency to return already 
evident ? What are the peculiarities of current data ? 
These are questions reserved for consideration in subse- 
quent numbers of the Review, and in the Monthly 
Supplements which will be devoted to the analysis of 
current statistics. 

The Index of General Business Conditions presented 
has been constructed from the data for the period 1903 
to 1914, and is offered as a guide to the relations that 
exist among economic phenomena in ordinary peace 
times. It obviously is of no value in interpreting the 
connection that existed during the Great War, but is 
offered in the belief that it will prove serviceable in the 
study of conditions following the establishment of peace. 
While the war may have changed many things, it prob- 
ably has not altered the fundamental conditions which 
in ordinary times produce fluctuations in business 
activity. It may be expected, therefore, that, as condi- 
tions return to normal, the relations that existed be- 
tween most, if not all, of our series of economic and busi- 
ness statistics will tend to be reestablished upon the 
basis which obtained from 1903 to 1914. To the extent 
that this expectation proves justified, the Index of 
General Business Conditions should prove a serviceable 
guide to the fundamental economic movements of the 
years that lie ahead. 

The statements made in the preceding sections are 
conclusions. These conclusions have been presented, 
so far as possible, in non-technical language. They have 
been drawn, however, only by the application to busi- 
ness data of a considerable amount of mathematical 
and statistical technique. The soundness of the analysis 
and technique conditions the soundness of the conclu- 
sions reached. In accordance with scientific usage we 
shall give a complete explanation of the basis of our 
conclusions so that it will be possible for anyone in- 
terested to follow the argument step by step. This 
explanation, however, cannot be made entirely non- 
technical. The individual items used are themselves 
composite; they are symptomatic of the activity of a 
complex economic organization; the methods devel- 
oped cover a large portion of the theory of statistics; 
the concepts involved demand careful examination; 
the nature of economic correlation and causation re- 
quires scrutiny. 

Consideration of the nature of the data used, a 
detailed description of the statistical methods developed 
for correcting the original items and comparing the 
corrected series, and definitions of concepts, these are 
presented in Part II of this study. The attempt will be 
made to present the ideas and arguments clearly and 
non-technically. The complexities of the subject, how- 



ever, make the difficulties of exposition considerable. 
We shall attempt to secure simplicity and accuracy in 
statement. When there appears to be a choice between 
the two, however, the latter will not be sacrificed to the 
former. 

Anyone interested only in results and willing to accept 
our results on authority will find them summarized in 
the preceding sections, and in Section 3 of Part II; any- 
one desiring to examine in full the bases of our conclu- 
sions should read Part II, and study the data presented 
in Parts III and IV. 



II. THE METHOD USED 

1. Resume: of the Preceding Study 

In the January number of The Review or Eco- 
nomic Statistics fifteen monthly series of business 
statistics were presented and analyzed. Each item was 
conceived to be a compound or composite, that is, each 
actual figure was conceived to be a magnitude deter- 
mined by the concurrent action of distinct causes. A 
method was developed of measuring and eliminating 
those constituents of the items of time series ascribable 
to secular trend and seasonal variation. The method 
developed was applied to the fifteen series of business 
statistics covering periods of from sixteen to twenty- 
six years. 

The need of eliminating secular trend and seasonal 
variation from statistical series before attempting to 
employ them as indices of business conditions, as well 
as the method which we have used in such elimination, 
may be made clear by a simple illustration. Our pur- 
pose is to judge the significance of items for different 
dates. Let us take the production of pig iron in the 
United States at any three dates some distance apart 
from each other. The tonnage of pig iron produced in 
the United States in March 1904 was 1,447,000, the 
tonnage in February 1908 was 1,077,000, while the 
tonnage in December 1918 was 3,434,000. Which one 
of these figures indicates the greatest relative activity 
of the iron industry, relative, that is, with respect to the 
facilities of the industry at the time in question ? To 
answer this question we must take account of the fact 
that the country and the iron industry have grown in 
the interval from March 1904 to December 1918. It is 
obvious that if the growth were by equal monthly 
increments, a " normal " of 1,528,000 tons in March 
1904, for instance, would not be normal in 1908 or 1918. 
On the assumption of uniform growth the normal of 
February 1908 would be 1,901,000 tons, and that of 
December 1918 would be 2,932,000 tons. The actual 
production figures for the three months are 1,447,000 
tons, 1,077,000 tons, and 3,434,000 tons, respectively. 
The ratios of the actual figures to the corresponding 
" normal " figures are 95 per cent, 57 per cent, and 117 
per cent, respectively. Omitting the percentage signs, 
the figures just given are termed indices. They are 
indices of pig-iron production so constructed, by the 
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TABLE B. — RANGE OF PERCENTAGE DEVIATIONS OF ORIGINAL ITEMS FROM SECULAR TREND 

CORRECTED FOR SEASONAL VARIATION 

i. New York Clearings —36 (December 1907) to +49 (January 1906) 

2. Production of Pig Iron —44 (December 1903, January 1908, May 1908) to +27 (December 1909) 

3. Outside Clearings —23 (December 1907) to +13 (January and May 1907) 

4. Bradstreet's Prices * —9.1 (June 1908) to +9.1 (March 1907) 

5. Imports —32 (March 1908) to +26 (December 1906) 

6. Building Permits — 51 (March 1908) to +48 (August 191 1) 

7. Railroad Gross Earnings — 16 (July 1908) to +16 (May 1907) 

8. Shares Traded * — 71 (June 1904) to +128 (January 1906) 

11. Business Failures (Bradstreet) —27 (January 1910) to +34 (January 1908) 

12. Yield of 10 Railroad Bonds* —4.1 (April and May 1909) to +11.3 (November 1907) 

13. Rate on 4-6 Months Paper '. . . —21 (September 1908) to +38 (July 1913) 

14. Rate on 60-90 Day Paper —31 (December 1908) to +45 (June 1913) and +53 (December 1907) 

* Not corrected for seasonal variation. 



use of bases increasing with time, that the normal 
growth, or " secular trend," is eliminated. 1 

But there is another type of fluctuation which must 
be removed before the figures may be used as indices of 
business conditions. We must remove from the data 
those variations which are purely seasonal in character, 
which recur year after year as a consequence of the 
round of the seasons. If we could assume that the con- 
ditions of all the months of the year are alike, that there 
is no characteristic and recurrent variation which regu- 
larly differentiates the item for March from that for 
February or December, then we could legitimately 
compare the indices 95, 57, and 117. But the iron in- 
dustry is seasonal in character. Just as a normal pro- 
duction for 1904 is not normal for 1918, so, within any 
year, a " normal " production for March is not normal 
for February or December. If we represent the average 
monthly production for any year by 100, the produc- 
tion for March, February, and December would be 
represented by 106, 94, and 100, respectively. Conse- 
quently, the seasonal indices 106, 94, and 100 should 
be subtracted from the indices of 95, 57, and 117 found 
above, in order to get significant figures for compari- 
son. The differences are —11, —37, and +17. 2 These 
differences are "percentage deviations of original items 
from secular trend corrected for seasonal variation." 
They are comparable with each other; and they meas- 
ure the relative activity (with respect to existing 
equipment) of the iron industry in March 1904, Feb- 
ruary 1908, and December 1918. The deviations reveal 
the depression in March 1904, the deep depression of 
February 1908, and the great activity of December 
1918. These deviations become still more significant 
when we compare them with others computed in the 

1 See Memoranda, p. 210 for a derivation of the slope of the line of 
secular trend, based upon the concept of arithmetic average incre- 
ments between various pairs of years entering the period in question. 

2 We might, of course, compute ratios, instead of differences, and 
take deviations from 100. The resulting figures would be 



95. 
106 



- IO o,S- 7 . 
94 



100, 



and 100, or 



-i°, -39, and +17, 



respectively. It is obvious that except in extreme cases the deviations 
do not differ appreciably from those secured by the easier arithmetic 
process. The arithmetic process slightly tones down the extreme 
numbers, which is probably desirable. 



same manner for the period 1903-18. The months of 
deepest depression in the period were December 1903 
(—44), January 1904 (—38), January to June 1908 
(ranging from —37 to —44), November and December 
1914 (—40 and —41). The months of greatest activity 
were March 1905 to October 1907 (ranging from +9 to 
a maximum of +26 in July 1907), August 1909 to April 
1910 (ranging from +12 to +27), and September 1915 
to December 1918 (ranging, with five exceptions, from 
+ 10 to +24). Negative deviations of —14 and —13 
occurred in January and February 1918, as exceptional 
items in a series of forty positive deviations. They 
were undoubtedly caused by the severity of the winter, 
the railroad tie-up, and the fuel difficulties of the months 
in question. December 1917 (+2) and March 1918 
(+6) were also low. 

Other series of business statistics were treated in a 
fashion similar to that just explained for pig-iron pro- 
duction. For instance, the " percentage deviations of 
original items from secular trend corrected for seasonal 
variation " of the monthly rate of interest on sixty-to- 
ninety day commercial paper in New York for March 
1904, February 1908, and December 1918 were — 6, -f 1 2, 
and +39, respectively. The range of the deviations for 
the period 1903-18 was from —36 (November 191 5) to 
+61 (June 1918) ; previous to the war period the devia- 
tions ranged from minima of — 29 in August 1904 and 
September 1908 and —31 in December 1908, to maxima 
of +53 in December 1907 and +45 in June 1913. The 
range of the " percentage deviations of original items 
from secular trend corrected for seasonal variation " of 
twelve series for the period January 1903-July 1914 is 
given in Table B. 

The ranges of the deviations of the series just given 
are, respectively: 85, 71, 36, 18.2, 58, 99, 32, 199, 61, 
1 5.4, 59, and 84 points. With this variety of range from 
15.4 to 199 points in fluctuations of the percentage 
deviations, it is clear that one per cent deviation does 
not indicate the same degree of fluctuation in all series. 
The percentage deviations of a given series are compara- 
ble with each other but a new unit must be found to 
secure comparability of the deviations of one series with 
those of another series. The proper selection of such a 
unit is the problem at hand. 
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TABLE C. — PERCENTAGE DISTRIBUTION OF THE CYCLES OF VARIOUS MONTHLY SERIES WITH 
RESPECT TO MULTIPLES OF THEIR STANDARD DEVIATIONS, 1903-16 



Series 



Number of 
Items 



Standard 
Deviation 



Percentage Distribution 



Between 
+1 and -1 



Between 

+2 and —2 



Between 
+3 and -3 



Over +3 



Under -3 



1. New York Clearings 

2. Production of Pig Iron 

3. Outside Clearings 

4. Bradstreet's Prices * 

5. Imports 

6. Building Permits 

7. Railroad Gross Earnings 

8. Shares Traded 

9. Unfilled Orders U. S. S. C. t ■ • 

11. Business Failures (Bradstreet) 

12. Rate on 10 Railroad Bonds . . . 

13. Rate on 4-6 Months Paper . . 

14. Rate on 60-90 Day Paper . . . 



168 
168 
168 
144 
168 
168 
168 
163 
56 
168 
168 
162 
168 



20.3 

8.62 

3.68 
11.91 
20.4 

6.07 
49.6 
32.3 
13-55 

2.82 
16.46 
19.66 



72 
7i 
77 
70 

75 
66 
70 

78 
62 
72 
69 
67 
69 



95 
95 
95 
96 

94 
98 
96 
96 
96 
96 
98 
96 
96 



99 
100 

98 
100 

99 

99 

100 

98 
100 

99 

99 

100 

100 



1 
o 
2 
o 

I 
I 
o 

2 

o 
I 

I 
o 
o 



o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 



Period 1903-14. 



t Quarterly figures. 



Let us retrace the steps leading to this problem of 
obtaining a unit to secure comparability. The indi- 
vidual adjusted series are presumably corrected for 
secular trend and seasonal variation. The residual con- 
stituents of the original composite items are, then, the 
irregular and cyclical fluctuations. In the process of 
adjustment the original items were expressed as per- 
centages of the corresponding ordinates of secular 
trend. Consequently, the corrected items of a given 
series, being relatives, are comparable over a period of 
years; that is, account has been taken of a change in 
the level of fluctuation. When we come to compare, 
however, the residual items, or " percentage deviations 
of the original items from the corresponding ordinates 
of secular trend corrected for seasonal variation," of 
two or more series with each other, a difficulty is met. 
The percentages do not mean the same thing in different 
series; they are not comparable. Some series fluctuate 
more violently than others; the items (or variates) of 
different series are more or less variable. 

In order, then, to make a legitimate comparison of 
the fluctuations of the various series with each other, 
such series must be expressed in terms of proper units. 
The units must be selected with respect to the problem 
in hand and, therefore, must depend upon the varia- 
bility of the series. What is needed is some common 
measure of variability, in terms of which each of the 
series may be stated. Such a measure is the " standard 
deviation " — the most widely used index of varia- 

1 The standard deviation is the square root of the arithmetic mean 
of the squares of all deviations, deviations being measured from the 
arithmetic mean of the observations. If x represents deviations from 
the arithmetic mean, n the number of items, <r the standard deviation, 

and 2 the process of summation, we have a 3 = - 2 (x 2 ). 

n 

" Since a small value for the standard deviation can arise only 
when the variates are closely concentrated about the mean or mode, 
and since a large value must be due to a relatively high frequency of 
the variates near the extremes of the distribution, the standard devia- 



bility. The standard deviation is an average depending 
upon all of the items and varying directly with the dis- 
persion or scatter of the items; the wider the distribu- 
tion of the items the greater is the standard deviation. 1 

In comparing the fluctuations of the several cor- 
rected series, therefore, we have expressed the percent- 
age deviations in terms of the respective standard 
deviations as units. 2 

The figures thus obtained are termed " cycles." 
These cycles, whether in numerical or graphic form, are 
the appropriate magnitudes to compare or correlate. 
Charts of the cycles are given in Part IV on pp. 199-205 
following. It will be noticed that for the period 1903- 
16 there are very few cases in which the cycles are more 
than +3.0 or less than —3.0. Table C shows the per- 
centage distribution of the cycles of thirteen series 
having in all 2037 items. From two-thirds to three- 
fourths of the items of each series range between +0 - 
and — a, the standard deviations; from 94 to 98 per 
cent of the items of each series range between +20- and 

— 2<r; from 98 to 100 per cent of the items of each series 
range between +30 and — 30-; only nine out of the 
total of 2037 items are over +30-, and none are under 

— 3<r. It is clear that the use of the standard deviation 
as a unit secures numerical comparability. 

To the twelve series of cycles published in the 
January issue of the Review there have been added 
eight new series. The numerical values of the monthly 
cycles of all twenty series are given in Part IV, 188-196. 

tion is a measure of the dispersion of the data. Because the effect of 
squaring is to diminish the importance of the smaller values and to 
exaggerate the importance of the larger values, a small value for the 
standard deviation shows conclusively that the data is (sic) highly 
true to type and stable, while on the. other hand a large value may to 
some extent be due to the presence of the larger frequencies of the 
extreme variates and hence not altogether significant. But even with 
this qualification the standard deviation is a thoroughly practicable 
and reliable index of the dispersion of the data." (West, Introduction 
to Mathematical Statistics, p. 51.) 

* See The Review or Economic Statistics, January 1919, p. 36. 
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All the other data connected with the eight new series 
are likewise given in Part III, pp. 161-182. 

The problem now before us is the measurement of 
the correlation between the cycles of the various series. 

2. The Method Adopted of Measuring Correlation 
between the corrected series, illustrated 
by Pig-iron Production and Interest Rate on 
srxty-to-nlnety day commercial paper 

The corrected series which we have termed cycles 
are expressed in proper units for comparison. What 
method of measuring the correspondence or lack of cor- 
respondence between the fluctuations shall we adopt ? 
To put the question with reference to a specified pair of 
series, what is the degree of correlation between the 



lation. The maximum degree of correlation appears to 
be, not between items concurrent in time (synchronous), 
but between the items of pig-iron production and those 
of interest rate for, say, six months later. As would be 
expected, the amount of lag necessary to secure the best 
fit of the two graphs to each other varies. The graph of 
pig-iron production for 1904-06 is closely similar to the 
graph for interest rate for a year later, that is, 1905-07. 
For this short period interest rate lags one year. During 
the period 1907-11 the cycles of interest rate lag six 
months behind pig-iron production. During the period 
January 1912-July 1914 the lag of interest rate for 
maximum correlation with pig-iron production appears 
to be three or four months. Comparison of the two 
graphs with the cycles of interest rate shifted to the 



+3.0 




1903 /904 /905 /906 /907 1908 1909 /9/0 /9// 19/2 19/3 /9/4 /9/5 19/6 /9/7 /9/S 



Chart A. — Comparison of cycles of (a) monthly tonnage of the production of pig iron in the United States, and (b) monthly rate of interest 

on sixty-to-ninety day commercial paper in New York, 1903-18. 



cyclical fluctuations of, say, the monthly tonnage of pig 
iron produced in the United States (Chart 102, p. 201) 
and the monthly rate of interest on sixty-to-ninety day 
commercial paper in New York (Chart 114, p. 205) ? 

The obvious procedure is to plot the graphs for the 
two series on the same chart, and examine such graphs 
for similarity or dissimilarity of movement. Chart A 
presents the cycles of pig-iron production and rate of 
interest on sixty-to-ninety day commercial paper. In 
comparing the two graphs let us limit ourselves for 
the present to the period ending with July 19 14. 

Upon examining Chart A the first feature to attract 
notice is the similarity of the wave movements. It is 
evident that the cycles correspond, or in other words, 
that they are positively correlated. On the whole, 
positive items of one series correspond to positive items 
of the other series and negative items of one series corre- 
spond to negative items of the other series. 

A second feature of Chart A that commands imme- 
diate notice is that the wave movements of the rate of 
interest on sixty-to-ninety day commercial paper follow 
or lag behind the wave movements of the tonnage of pig 
iron produced in the United States. This fact of lag, and 
it appears to be a fact, may be stated in terms of corre- 



left one year, six months, and three months, respec- 
tively, is made in Charts B, C, and D. 

The systematic correspondence of the two wave 
movements, whether the lag of interest rate be one 
year, six months, three months, or any other interval, 
was definitely interrupted in August 1914. Immediately 
following the declaration of war interest rate rose 
violently and pig-iron production slumped. Interest 
rate on sixty-to-ninety day paper then fell to a low 
level in 1915-16 and recovered in 1917-18. Pig-iron 
production rose violently during 191 5 and remained 
high during 1 916-18, except for the slump in the winter 
of 1917-18, when there was a combination of intensely 
cold weather, fuelless days and railroad congestion. 

From the comparison which has just been made it 
would appear that in peace times movements in pig-iron 
production anticipate movements in interest rate on 
commercial paper by an interval of three to twelve 
months. In order to arrive at this conclusion from a 
comparison of the graphs, it is necessary to make, not 
merely one comparison of pairs of items for the same 
dates, but many comparisons, with items of one series 
paired with various items of the other series. 

To secure such comparisons a mechanical contri- 
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vance was devised. The cycle charts were drawn on A table showing a record of the observations for all 

translucent tracing cloth. A box was constructed with series is given in Part TV, p. 184. The results of the 

a glazed top illuminated by electric lights placed within comparison of the cycles of pig-iron production and 

the box. The accompanying figure illustrates the interest rate on sixty-to-ninety day commercial paper 

device. are as follows: 
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Cham B. — Comparison of cycles of (a) month- 
ly pig-iron production and (b) monthly in- 
terest rate of one year later, 1903-07 (dates 
refer to pig-iron production) . 
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Chart C. — Comparison of cycles of (a) monthly pig- 
iron production and (b) monthly interest rate of six 
months later, 1906-n (dates refer to pig-iron pro- 
duction). 
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Chart D. — Comparison of cycles of 

(a) monthly pig-iron production and 

(b) monthly interest rate of three 
months later, Jan. ion-April 1914 
(dates refer to pig-iron production). 



One cycle chart was placed upon another so that 
the horizontal zero lines and the vertical year lines 
coincided. The two graphs were then in position to 
determine the correspondence in sign and magnitude 
of concurrent items. A shift of the upper chart to the 
right or left enabled the observer to estimate the rela- 
tive positions of the two graphs which would secure the 
best fit of one curve to the other. 

Twenty cycle charts were compared in this way, 
making 190 comparisons by each of three independent 
observers. Each observer recorded his results as 
follows: 

(a) Degree and sign of correlation: high, moderate, 
low; + or — . 



COMPARISON OF CYCLES, 1903-14 



2. Pig-iron production ) 
of the United \ 

States J 

(a) Correlation posi- (b) 
tive and high. 



compared 
with 

No. 14 lags s or 6 
months for maxi- 
mum correlation 
with no. 2. 



14. Interest rate on 
sixty-to-ninety day 
commercial paper. 

(c) Fairly systematic 
for lag of, say, 6 
months in no. 14, 
but varies from 3 to 
12 months. 



The graphic method of the measurement of correla- 
tion has advantages; but it also has limitations. It is 
reliable when the correspondence between the curves is 
simple and regular. It is uncertain when the correla- 
tion and lag are variable. The method is easily applied 
and it yields valuable results. Those results, however, 




Illuminated Box with Glazed Top to Facilitate Comparison of Cycle Charts. 



(c) 



Direction and extent of lag in months for the 
maximum correlation during the entire period. 
Characterization of the consistency of lag for 
maximum correlation: excellent, good, fair, poor. 



are approximations dependent upon the judgment of 
the observer. His personal equation, preconceived 
notions, or theoretical bias influence his conclusions. 
Such conclusions need to be checked up by a more 
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TABLE D. — CYCLES OF (a) MONTHLY TONNAGE OF PIG IRON PRODUCED IN THE UNITED STATES AND 
(b) MONTHLY INTEREST RATE ON 60-TO-90 DAY COMMERCIAL PAPER IN NEW YORK 



1903 



1904 



i9°5 



1906 



1907 



1908 



1909 



1910 



January . . 
February. 
March 

April 

May 

June 

July 

August. . . 
September 
October . . . 
November 
December . 



(a) 
+ -3 



+ 
+ 
+ 
+ 
+ 
+ 
+ 



(b) 

+ 
+ 
+ 

+ 
+ 



+ -4 

- -5 
-1.6 

-2-3 



.1 
.1 

•5 
.2 
.1 

■S 
•4 
+ -5 
+ -3 
— .1 

+ -3 
.0 



(a) 
— 2.0 



(b) 



— 1 

— 1 



+ 



— 1 

— 1 

— 1 

— 1 

— 1 

— 1 

— 1 



(a) 

+ .6 

+ -3 

+ -7 

+ .8 

+ .8 

+ .6 

+ -5 

+ -7 

+ .8 

+ -9 
+1.0 

+1.1 



(&) 

— 1.1 

- -9 
— 1.0 



- -7 
— 1.0 

- -9 

- .8 

+ .1 
+ .2 



(a) 

+1.2 

+ -9 
+1.0 

+ -9 
+ -9 
+ .8 

+ -9 
+ -6 

+ -7 
+1.0 

+ i-3 
+i-3 



(b) 

+ .1 

+ -4 

+ -5 

+ .8 

+ .8 

+ .8 

+ .8 

+ -9 
+1.1 

+ .8 

+ -9 

+ .8 



(a) 

+ i-3 
+1.0 

+ -9 
+ 1.0 

+1.1 
+ i-3 
+1.4 
+1.2 
+1.0 
+1.1 
— .1 
-1.8 



(b) 
+ i-3 
+1.4 
+i-5 
+i-3 
+ -9 
+ 1.2 

+1.1 
+i-3 
+i-S 
+i-7 
+2.2 

4-2.7 



(a) 

-2-3 
-2.0 

-2.2 

-2.2 

-2-3 
-2.1 
-1.8 

-i-5 
-1.4 

-i-3 
-1.0 
- .6 



(b) 
+ 1.9 
+ .6 
+1.0 

2 

5 
7 
9 
4 
5 
3 
2 
6 



— 1 

— 1 

— 1 

— 1 

— 1 



(a) 

- -4 

- -4 

- .8 



.6 

.2 



+ -4 
+ .6 

+ -9 

+ 1.1 

+i-3 
+1.4 



(b) 

— 1.1 

— -9 

— 1.1 
— 1.0 
— 1.0 
— 1.0 

— 1.2 

— -9 
— 1.0 

— -3 
.0 

— .2 



(a) 
+ 1.4 
+1.1 
+1.0 



.8 
•5 
■S 
.2 
.0 

- .2 

- -4 

- .6 

- -9 



(b) 
+ .2 
+ .2 
.0 

+ -4 

+ -5 

+ .8 
+ 1.1 

+ -7 

+ -5 

+ -5 

+ .6 

- -5 



1911 



1912 



1913 



1914 



1915 



1916 



1917 



1918 



January. . 
February. 
March 

April 

May 

June 

July 

August . . . 
September 
October . . 
November 
December , 



(«) (b) 

9 

6 



6 - 

5 - 

6 - 



— 1 



(a) 

— -5 

— .1 

— .1 

+ .1 

+ -3 

+ -4 

+ -4 

+ -5 

+ -4 

+ -5 

+ -7 

+ -9 



(b) 

- .6 

- -4 

— .1 

— .1 
+ .1 
+ .1 
4- -4 
+ -5 
+ .8 
+ 1.1 
+ 1.1 
+i-3 



(a) 
+1.0 



.8 
•5 
•7 
•7 
.6 

•S 



+ -4 

+ -3 

.0 

- -4 
— 1.0 



(b) 

+ -7 
+1.0 
+1.8 
+1.6 

+i-S 
+2.3 
+2.2 

+i-7 
+1.2 
+1.0 
+1.0 
+1.0 



(a) 

— 1.2 

- -9 

- .6 

- .6 
— 1.0 

— 1.1 
— 1.0 
— 1.0 

-i-3 
-1.8 

— 2.1 
-2.1 



(b) 

+ -3 

- .2 

- -3 

- -4 

- .1 

+ .1 
+ -4 

+2.3 
+2.4 
+2.0 
+ 1.1 

- -5 



(a) 
-1.9 
-i-5 
-i-3 

- 1.1 

- -9 

- -3 
+ .1 
+ -4 
+ -5 
+ -7 
+ .8 
+1.1 



(b) 

- -4 

- .2 

- .8 

- -4 

- .2 

- .1 

- -9 
— 1.0 
-1.6 
-1.8 
-1.8 
-1.8 



(a) 
+1.1 
-f-i. 1 
+1.0 

+ -9 
+ 1.1 

+1.1 
+1.2 
+1.0 
+1.0 

+ i-3 
+ 1.1 
+ .8 



(b) 

- 1.2 

- .8 
— 1.0 

- -9 

- .8 

- .1 
+ .1 

- .6 
-1.4 

-i-5 

- 1.1 



(a) 
+ .8 
+1.1 
+ .6 

+ -9 
+1.0 
+1.0 
+1.2 
+ -9 
+ -7 
+ -7 
+ -7 
+ .1 



(b) 
- .6 

+ -5 
+ -3 
+ - 6 
+ 1.4 
+1.8 

+1.1 

+ -9 
+1.0 
+1.2 
+i-S 
+1.4 



(a) 

- -7 

- -7 
+ -3 
+ .6 
+ .8 

+ -9 
+ 1.1 
+1.0 
4-1.0 
+ .8 
+ .8 
+ -9 



(4) 

4-2.1 

4-2.6 

4-2.7 
4-2.8 
-I-2.9 

+3-i 
4-2.8 

+ 2-5 
4-2.2 
4-2.1 

+ 2-3 
4-2.0 



objective method of measuring correlation. There is 
need of a quantitative measure, of a coefficient of corre- 
lation, which will enable us to decide, for instance, 
whether the correlation between pig-iron production 
and interest rate is greater or less than the correlation 
between wholesale prices and interest rate. There is 
need of such an average or index, moreover, to tell us 
what lag to take, on the whole, when examination re- 
veals a variable lag. In other words, the inspection of 
the graphs with the aid of the illuminated box is an 
excellent preliminary operation. The results obtained 
give a valuable guide for further investigation; but a 
quantitative measure of correlation is needed to supple- 
ment and test the tentative conclusions which the 
graphic method has provided. 

To secure the quantitative measure desired it is 
necessary to define correlation in quantitative terms. 
In the first place, correlation between two series may be 
either direct (positive) or inverse (negative). Direct 
correlation exists when positive items of the first series 
are associated with positive items of the second series, 
and negative items of the first series are associated with 
negative items of the second series. In the second place, 
correlation between two series may be numerically high 
or low. Correlation is high if large items of the first 
series are associated with large items of the second series, 
and small items of the first series are associated with 
small items of the second series. Correlation is low if 



large items of the first series are associated with small 
items of the second series and vice versa. Correlation is 
zero if items of the first series, negative and positive, 
large and small, are indiscriminately associated with 
the various items of the second series. For convenience 
in referring to the two series the first will be called the 
standard series and the second will be called the correl- 
ative series. 1 

In attempting to obtain a numerical measure of cor- 
relation in accordance with the concept just defined, we 
have as our material the cycles, or in other words, " the 
percentage deviations of original items from secular 
trend corrected for seasonal variation " expressed in 
comparable units, the respective standard deviations. 

Let us consider the numerical values of the cycles of 

1 The definition of correlation given by A. L. Bowley (Elements of 
Statistics, p. 316) is as follows: " When two quantities are so related 
that the fluctuations in one are in sympathy with the fluctuations of 
the other, so that an increase or decrease of one is found in connec- 
tion with an increase or decrease (or inversely) of the others, and the 
greater the magnitude of the changes in the one, the greater the mag- 
nitude of the changes in the other, the quantities are said to be 
correlated." This is really a definition of correlation for the first differ- 
ences (changes in adjacent items) of one series and the associated first 
differences of another series. In the present discussion we are inter- 
ested in the correlation between cycles, not in the correlation between 
first differences of cycles. Generally, if the correlation is high for 
cycles it will be high for first differences; but the exceptions are im- 
portant. For instance, two series may have opposite month-to-month 
oscillations, these oscillations being superimposed upon like cyclical 
movements. In such case, the correlation between first differences 
would be inverse and between cycles would be direct. 
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pig-iron production and the corresponding values, either 
concurrent or with a definite lag, of the cycles of the in- 
terest rate on sixty-to-ninety day commercial paper in 
New York. What items shall we pair ? The prelimi- 
nary graphic comparison indicated that the amount of 
lag of interest rate ranged, at different periods, from 
three months to a year; maximum correlation for the 
whole period January 1903- July 1914 appeared for a lag 
of six months. Accordingly, we will set up a one-to-one 
correspondence between the items of pig-iron production 
and those of interest rate six months later. The items 
for the two series are given in Table D. 

Beginning with January 1903 for pig-iron production 
and July 1903 for interest rate, consider the pairs of 
items: (+.3, +.4), (+.2, +.5), ( + .3, +.3), (+-5, 
-.1), (+.7, +.3), (+.9, o), (+.5, -.3), (+.5, o), 
. . . , ( — 1.2, -f- .4). The last two items are for January 
and July 1914, respectively. 

Consider the products of the paired items, +.12, 
+ .10, +.09, -.05, +.21, o, -.15, o, . . . , -.48, re- 
spectively. Positive products occur if the factors are of 
like algebraic sign, negative products occur if the factors 
are of unlike signs. Moreover, the products vary even 
though the sum of the factors is constant. Thus, the 
sums of the first two pairs are each -f- .7 but the products 
are +.12 and +.10, respectively. The larger product 
results from factors nearly equal in size, rather than 
from a large and a small factor. This statement may 
be generalized. The maximum product of two factors 
having a constant sum occurs when the factors are 
equal. 1 

Let us consider the properties of the products of 
pairs of items with respect to the problem of finding a 
coefficient of correlation. Suppose we start out with 
the items of a certain series, pig-iron production, as the 
given data. What series can we build up or construct 
artificially which possesses the maximum degree of cor- 
relation with the given series ? It is obvious that a 
series constructed with items identical with those of pig- 
iron production is most highly correlated with it. Such 
a series would, for a given total to distribute as items, 
also give maximum values for all the products of cor- 
responding pairs. 2 Furthermore, all products would be 
positive. On the other hand, if we should build up a 
series by merely changing the signs of the items of pig- 
iron production and pair the items of the two series, the 
products would be maxima numerically, but all negative 
in sign. Thought of graphically, the curve presenting 
the series so constructed would be the reflection of the 
former curve in a horizontal mirror. The inverse correla- 
tion would be a maximum in this case. Between the two 
extreme cases there could be constructed various series 
having various degrees of correlation with pig-iron pro- 
duction. By varying the magnitudes, or the signs, or 
both, a series could be devised more or less similar to a 
given series. 

It is clear, then, that the greater the sum of the prod- 
ucts of corresponding items, the greater will be the 
correlation between the two series, the maximum sum 



of products occurring when equal items are paired, that 
is, when the two series are identical. The sum of prod- 
ucts of corresponding pairs of items may, therefore, be 
used as a measure of correlation. In order that one 
value, however, computed from, say, 50 items, may 
legitimately be compared with another value, com- 
puted from another number of items, say 100, it appears 
best to divide the sum of the products by their number. 
That is to say, we shall use as our coefficient of correlation 
the arithmetic mean of the products of corresponding pairs 
of cycles? 

The coefficient cannot have a numerical value greater 
than unity; it takes that value when the two series 
paired have corresponding cycles of like numerical 
values. In such case the coefficient will be + 1 when the 
algebraic signs of the cycles paired are alike; it will be 
— 1 when the algebraic signs are unlike. 4 In all other 
cases the coefficient will take numerical values ranging 

1 Stated in geometric terms: the square is the rectangle of maxi- 
mum area for a given perimeter. The proof is simple. Given a to find 
the value of x which gives a maximum value of x(a — x). Differen- 
tiate with respect to x, equate to zero and solve, getting x = - • 

2 

2 The reader may need to be reminded that both series are supposed 
to be in terms of their standard deviations. That is, the sum of the 
squares of the cycles for all the series is the same. The absolute sums 
of a like number of the items for all series in such case does not vary 
widely, the range being from 124 to 146. For instance, the absolute 
and algebraic sums of the cycles of various series for the period 1903-16 
is given by the following table: 



Semes 



i. New York Clearings 

2. Production of Pig Iron 

3. Outside Clearings 

4. Bradstreet's Prices * 

5. Imports 

6. Building Permits 

7. Railroad Gross Earnings 

8. Shares Traded 

9. Unfilled Orders U. S. S. C 

10. Tonnage Entered 

11. Business Failures (Bradstreet) . . . 

12. Yield of 10 Railroad Bonds 

13. Rate on Four-to-six Months Paper 

14. Rate on Sixty-to-ninety Day Paper. 
1 j. Rate on Call Loans — Mitchell . 



Sum of Cycles, 1903-16 



Absolute Algebraic 



I33-I 
I4S-9 
124.4 
113. 2 
125.2 
132.9 

134-7 
122.3 
48.6 f 

i 
130.6 
137.6 . 
134.2 
141 .0 

t 



+ -S 
+ -5 

— 1.2 

+3-4 
+ 2.6 

+ -S 
+ 2.9 

+6.5 
-1.6 

% 
+4-8 

— .2 
+ 2.0 
— 1.0 

% 



Arithmetic 
Average 



+ .003 
+ .003 

— .007 
+ .024 
+ .016 
+ .003 
+ •017 
+ .040 

— .029 

% 
+ .029 

— .001 
+ .012 

— .006 
% 



* 1903-14 for (4) Bradstreet's Prices. t Quarterly cycles. X Cycles not computed. 

3 Since the cycles are deviations from the linear secular trend, so 
that their algebraic sum is zero, and since the standard deviation of 
each series is unity, the coefficient of correlation above defined is the 
well-known Pearsonian coefficient. It was invented by Bravais in 
1846 and developed and applied by Galton, Pearson, Edgeworth, 
Yule, and others. See the section of this paper entitled " General 
Considerations Relating to the Measurement of Correlation," pp. 
130, for a brief history of the coefficient with reference to important 
memoirs. 

4 If x and y represent the deviations of the items of two time series 
from their respective linear secular trends, n stands for the number of 
pairs, r represents the coefficient of correlation, and 2 stands for " the 
sum of such terms as " we have: 



n 



( /S?' IXyA = 



-Z— \ 2xy 



VSr'-Sy 2 



For maximum positive correlation x = y and r, therefore, becomes 
equal to +1. For maximum negative correlation x = — y and 
r = — 1. Consequently, the limits of possible values of r are +1 and 
— 1. See Bowley, Elements of Statistics, p. 319, for a rigorous proof 
that -i<f^+i. 
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between o and i. The zero of the scale occurs for 
the case in which the pairing of the items of any two 
series is made at random. That is, if we should pair 
a given series, say, the production of pig iron, with a 
series of cycles constructed by taking as its items the 
numbers of the pages of a book opened at random, we 
should expect the coefficient of correlation to be ap- 
proximately zero. Or, to cite another illustration of the 
meaning of the zero of the correlation scale, suppose 
that the numerical values of the monthly cycles of the 
production of pig iron are written on separate slips of 
white paper and deposited in a hat. Suppose, further, 
that the numerical values of the monthly cycles of the 
interest rate on sixty-to-ninety day commercial paper be 
written on separate slips of yellow paper and deposited 
in another hat. Now suppose that a white and a yellow 
slip are drawn simultaneously from the two hats and 
that the product of the two numbers drawn is taken; 
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Chart E. — - Coefficients of correlation between cycles of pig-iron pro- 
duction and interest rate on sixty-to-ninety day commercial paper, 
with o, 3, 4, 5, 6, 7, 8, 9, and 12 months lag of interest rate. (The 
maximum correlation, r = +.75, is for a lag of five or six months.) 

the process being continued until all the slips are thus 
drawn and the products of each pair taken. From the 
random character of the selection, we should expect 
some products to be positive, some negative; further, 
we should expect an approximately equal number of 
positive and negative products. In other words, we 
should expect the arithmetic mean of the products, i. e., 
the coefficient of correlation, to be numerically small, 
approximately zero. 1 

Let us now return to the question, what is the degree 
of correlation between the cycles of pig-iron production 
and those of interest rate on sixty-to-ninety day com- 
mercial paper six months later ? The arithmetic average 
of the products of corresponding items is +.75. We 
shall adopt r ,e as the symbol to indicate the coefficient 
of correlation in question. The subscripts indicate that 
the items of the former series are paired with items of 
the latter series six months later. 

The limiting values of the scale of correlation are 
then +1, o, and — 1. The first value, +1, indicates 
that the two series of cycles are identical; the second 



value, o, indicates a complete lack of correspondence 
between the series, such as would occur from pairing 
items at random; the third value, — 1, indicates that 
the items of the two series are numerically identical but 
opposite in sign. What do intermediate values indi- 
cate ? In other words, what is the meaning of such 
coefficients as +.04, +.37, +.48, +.80, +.96 or their 
negatives ? It is clear from what has been said that 
these coefficients indicate that the pairs of items yielding 
them have not been selected at random; nor are they 
identical. As it happens, the coefficients named have 
been computed from pairs of biometric items. The 
first, +.04, is the coefficient of correlation for breadth 
of head and ability of human beings. Being approxi- 
mately zero it indicates that the pairing in nature occurs 
at random. The remaining coefficients indicate more 
or less system in the pairing, the degree of system or 
correspondence being indicated by the size of the coeffi- 
cient. The coefficient +.37 is for length of forearm and 
stature; +.48 is for weight and stature; +.80 is for 
length of femur and stature; +.96 is for length of right 
and left femur. 2 

But is the correlation a maximum for a lag of six 
months in interest rate ? Computing the coefficients 
of correlation between pairs with a lag in interest rate 
of o, 3, 4, 5, 6, 7, 8, 9, and 12 months, we have the 
following results: 



r ,o 


r ,s 


r ,i 


ro,s 


f0,6 


ro,i 


r ,s 


H,» 


»"0,12 


+•34 


+ .67 


+ .72 


+•75 


+ •75 


+ •73 


+.70 


+ .65 


+ 45 



These coefficients indicate that the maximum correla- 
tion is for a lag of four-to-eight months in interest rate. 
The fit of the cycles of pig-iron production and those of 
interest rate six months later is shown in Chart E. 

Having determined the maximum correlation be- 
tween the two series, there remains to be considered 
one further question. Is +.75 a low or a high figure in 
the correlation scale ? Does +.75 indicate a significant 
likeness between the two series compared or is it a 
likeness that might readily occur accidentally ? The 
general question of the significance of the values of the 
correlation scale has been considered non-technically 
in the preceding pages. To answer our specific ques- 
tions fully and rigorously would lead us into the intri- 
cacies of the mathematics of the theory of probability, 
an unnecessary digression. 3 Some further consideration 
of the general question, however, is advisable. Greater 

1 We may remind the reader that we are dealing with series of 
cycles which have an approximately equal number of positive and 
negative items. Further, the frequency of small cycles is greater than 
the frequency of large cycles, so that we expect a larger number of 
small products than of large products. In other words, given a normal 
distribution of the items of each series, there will follow a normal dis- 
tribution of the products of random pairs. 

2 The meaning of the various values in the correlation scale, as of 
those in any scale, the centigrade scale of temperatures, for instance, 
can only be " sensed " by repeated experience. 

3 For a mathematical discussion of the theory of probability see, 
for instance, Arne Fisher, The Mathematical Theory of Probabilities, 
vol. i (1915), " Mathematical Probabilities and Homograde Statis- 
tics," and L. D. Weld, Theory of Errors and Least Squares (1916). 
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precision must be given to the concept of " expectation," 
a concept fundamental to a proper understanding of 
the significance of the correlation scale. It is clear that 
a small amount of correlation and a small coefficient 
might be expected to occur between two series chosen 
quite at random. It is also manifest that although a 
high coefficient for two series chosen at random is 
possible, it is not probable. Expressed otherwise, the 
larger the coefficient of correlation the smaller is the 
probability that it has resulted accidentally. 

The probability that a given coefficient, say +.60 or 
+.65, may occur for two series chosen at random does 
not depend exclusively upon the size of the coefficient. 
It depends as well upon the number of pairs of items 
which have been used in computing the coefficient. 
For instance, a coefficient based upon 12 pairs does not 
carry the same weight, it is not so significant of persist- 
ent relationship as one based upon a larger number of 
pairs, say 130. Thus, the coefficient of correlation for 
monthly pig-iron production for the calendar year 1904 
and monthly interest rate on sixty-to-ninety day com- 
mercial paper for 1905, twelve months lag and 12 pairs 
of items, is +.61. The coefficient for the same series 
and nine months lag but for pairs of items between 
the dates January 1903 and July 1914, 130 pairs, is 
+.65. The latter coefficient (+.65) is much more 
significant than the former (+.61), not because of the 
numerical difference between them, but because of the 
wider basis of the coefficient (130 pairs as compared 
with 12). Using a twelve months lag for the period 
January 1903 to July 1914, 127 pairs, the coefficient is 
+.45. This result, considered in connection with the 
result of +.65 previously obtained, means that a twelve 
months lag is not nearly so persistent during the entire 
period under discussion as a nine months lag. 

In judging the significance of coefficients, then, the 
fundamental notions are the size of the coefficient and 
the number of pairs of items utilized. Expressed in 
mathematical terms, the probability that a given coeffi- 
cient has resulted from totally unconnected causes is a 
function of the coefficient and the number of pairs of 
items used in its computation. The form of this func- 
tion has been derived. The value of the function which 
is most widely used is called the " probable error." 1 
The principles underlying the notion of probable error 
are the same as those underlying games of chance. 
Suppose we are asked, if a coin be tossed what is the 
probability that it will fall heads ? The answer comes 
readily, one chance out of two. Our knowledge of the 
shape of coins and our experience with gravitation give 
us confidence in our prediction. The " subjective prob- 
ability " is one-half. In other words, if the coin is 
circular, if it has two fiat faces, if it has a narrow edge, 
if it is evenly weighted, if it is tossed without control, 

1 See " Regression, Heredity, and Panmixia," by Karl Pearson, in 
Transactions of the Royal Society, 187 (1896), A, p. 266, and 191 (1898), 
pp. 229-311, especially p. 242. See also, Yule's Introduction to the 
Theory of Statistics, 1st ed., p. 348. 

2 R. H. Hooker, " An Elementary Explanation of Correlation," 
Quarterly Journal of the Royal Meteorological Society, October I9i8,pp. 
285-286. 



and if gravitation operates as it has been observed to 
operate in the past, the probability that the coin will 
fall heads is one-half. Similarly, if we place seventy 
white balls and thirty black balls in a bowl, mix 
them thoroughly, and then draw one at random, the 
" subjective probability " of drawing a white ball is 
seven-tenths. The known conditions of the processes of 
tossing coins and drawing balls enable us to predict with 
confidence before making experiments in the particular 
cases. 

It is not difficult to decide upon the probability of an 
occurrence of an event in such simple games of chance 
as we have just considered. But for complex economic 
problems the situation is quite different. The condi- 
tions of economic processes are not known, or at any 
rate they are not known accurately enough for us to 
decide upon the probability of an event depending upon 
them. It is possible to arrive at the probability of 
occurrence for an economic event only by use of the 
actual record of the happening of events in the economic 
world. We cannot set up an experiment in economics; 
we must draw our conclusions and state our probabili- 
ties on the basis of evidence that is yielded automati- 
cally, so to speak. To draw a parallel between economic 
events and the process of drawing black and white balls 
from a bowl, the problem of the determination of a 
" probability " in economics is similar to the problem 
of ascertaining the ratio between the unknown numbers 
of black and white balls in a bowl based upon a record 
of sample drawings of, say, ten balls at a time. In other 
words, just as we may estimate the relative number of 
black and white balls in a bowl from a record of experi- 
ments, so we may determine an " objective probability " 
from available economic statistics. 

To return to the illustration of coin tossing: when 
we say, " the probability is one-half that the coin will 
fall heads," we do not mean that if we actually toss the 
coin 1,000 times we shall get exactly 500 heads and 500 
tails. In this connection, we shall quote the explanation 
given by R. H. Hooker: 2 

"If we toss for a thousand times we shall almost 
certainly not get exactly 500 heads and 500 tails: we 
should get somewhere near those figures, and we should 
be very much surprised if we got 600 heads and only 
400 tails. So great a difference as a proportion of 3 : 2 
is very unlikely to occur, and if it did occur we should 
suspect very strongly that the tosser was able to exer- 
cise some (perhaps small) influence over the coin. Yet 
if we only toss five times, the nearest approach to equal- 
ity that we can possibly get is 3 : 2 ; and if we toss ten 
times we are almost as likely to get 6 : 4 as 5 -.5. 
Thus a distinct inequality in the case of a small number 
of observations is of no significance, whereas it is im- 
portant in the case of a large number. Similarly, a dis- 
tinct correlation coefficient is no evidence of connection 
if the number of observations be small, but the same 
figure is if they be numerous. With a given coefficient, 
and a knowledge of the number of observations on 
which it is based, the probability of its being a true con- 
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nection can always be calculated, and the ' probable 
error ' has been found to be the most useful guide. In 
practice it has been found absolutely safe to say that 
there is certainly a connection between two phenomena 
if the correlation coefficient be at least six times its 
probable error. This represents odds of many thousands 
to one in favour of the connection being real. . . . 

" Perhaps [an] example regarding probability may 
be cited as an illustration. It is well known that the 
bank at Monte Carlo ' always ' wins, and that a very 
steady revenue is derived therefrom. Yet it would 
seem, to judge by the rules of the play, that the odds in 
favour of the bank are only very slight. Slight as they 



chosen at random from a universe having, in general, 
the given degree of correlation. 1 In other words, sup- 
pose there are billions of pairs of items with a correlation 
coefficient of +.60. From these billions suppose we 
take at random 100 items and compute a coefficient. 
We then take other random samples of 100 pairs and 
compute the coefficients. These coefficients will not all 
be +.60, but will be scattered symmetrically about 
+.60 according to the " normal curve of error." That 
is, there will be an equal number of divergences above 
and below +.60, and the number of divergences in 
excess or defect will decrease as the size of the diver- 
gence increases. The shape of the normal curve of error 




Chart F. — The curve A' B'C' D'l? is the normal curve of error of a coefficient of correlation for samples of ioo pairs of items. 



appear, however, they tell in the long run, owing to the 
large number of times the game is played. But still it is 
not really an absolute certainty; a particular individual 
may upon occasion ' break the bank,' and therefore it is 
quite conceivable that such an operation may occur so 
frequently that the bank will have lost by the end of the 
year. With a correlation coefficient six times its prob- 
able error, the chances of there being no connection 
are far more remote than those of a given individual 
breaking the bank at Monte Carlo. 

" Thus, whatever the correlation coefficient (if not 
unity) there is always a remote chance — it may be 
many millions to one — that the connection is acci- 
dental. But it is always strictly true that the coefficient 
measures the resemblance that has actually held for 
the observations included in the calculations. It is 
only when we want to know whether the connection is 
causal, and, as a corollary, whether it will still hold in 
the future, that we have to consider questions of prob- 
ability. And it may perhaps not be amiss to remark 
that when a writer points out, as frequently occurs, that 
' inspection of the diagram shows that ups and downs 
correspond fairly, and therefore the two sets of phe- 
nomena are interdependent,' such a writer has fre- 
quently quite small odds in his favour." 

A coefficient of correlation computed from a limited 
series of economic data has, then, a margin of error. 
Expressed in other terms, any given coefficient of corre- 
lation is an attempt to estimate, from a sample, relation- 
ship, and that estimate has a probable error. As we 
have said the probable error " is a function of the coeffi- 
cient and the number of pairs of items used in its com- 
putation." Stated more precisely, the probable error 
of a given correlation coefficient is that divergence, on 
the average, on either side of the coefficient, within 
which exactly half the values of a large number of 
coefficients would lie if computed from pairs of items 



is bell-like, as illustrated by the accompanying diagram 
giving the normal curve of error of a coefficient of cor- 
relation based upon 100 pairs of items (Chart F) . Point 
D on the diagram corresponds to the coefficient of +.64 
and point B to one of +-56. 2 The area bounded by any 
two ordinates, as CO and DD' , the base line, CD, and 
the curve, C'D', corresponds to the number of coeffi- 
cients we may expect to get between +.60 and +.64. 
The area, BB'D'DB, is one-half of the area included 
between the curve A'C'E' and the base line ACE. 
The distances CB and CD represent the probable error, 
that is, the chances are equal that the coefficient com- 
puted from a given sample of 100 items will fall between 
the values +.56 and +.64. Similarly, the probability 
is about 68 to 32, or two to one, that the coefficient of 
such a sample will fall between +.54 and +.66; it is 19 
to 1 that the coefficient will fall between +.47 and 
+ .73; it is 997 to 3 that the coefficient will fall between 
+.41 and +.79. 

As has been said, the probable error depends on the 
size of the sample. For a coefficient of +.60 based upon 
25 pairs of items the probable error is .087; for 1,000 
pairs of items it is .014. That is, for a sample of 
1,000 pairs taken from a universe having the coefficient 
of +.60 the chances are equal that the coefficient com- 
puted from the sample will be between +.586 and 
+.614. Chart G, drawn on the same scale as Chart F, 

1 The expression for the probable error of the coefficient of corre- 

1 — r 2 
lation, r, is 0.674 — 7=^ ' where n indicates the number of items paired. 

In case the number of items is very large the probable error will be 
approximately zero. The word " universe " is used here to describe 
the complete homogeneous group of pairs. 

2 These values, +.64 and +.56, are obtained by substituting 
r = +.60 and n = 100 in the expression for the probable error, 



0.674 



V n 



finding the value .043, then adding and subtracting .04 



to and from +.60. The coefficient and its probable error are usually 
written + .60 ± .04. 



AN INDEX OF GENERAL BUSINESS CONDITIONS 



127 



shows the distribution of the coefficients of correlation 
for samples of 1,000 items. Comparison of the two 
charts shows that the dispersion of the coefficients com- 
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Chart G. — The curve A' B' C' D' E' is the normal curve of error of a 
coefficient of correlation for samples of 1,000 pairs of items. 

puted from samples decreases rapidly as the size of the 
sample increases. 

The coefficients of correlation for pig-iron production 
and interest rate on sixty-to-ninety day commercial pa- 
per for various degrees of lag, plus or minus the respec- 
tive probable errors computed from the formula just 
given, are as follows: 
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The probable errors, considered in connection with the 
coefficients of correlation, enable us to judge the signifi- 
cance of the coefficients. Let us recall the general rule 
already cited that " when r is greater than, say, six 
times its probable error, we may be practically certain 
that the phenomena are not independent of each other, 
for the chance that the observed results would be 
obtained from unconnected causes is practically zero." 1 
Applying this rule to the present case, we are safe in 
concluding, first, that there is a significant degree of 
positive correlation for pig-iron production and inter- 
est rate on sixty-to-ninety day commercial paper, and 
second, that interest rate lags behind pig-iron produc- 
tion five or six months. 

Comparison of the cycles of pig-iron production and 
interest rate and also of various other series has indi- 
cated that it would be unprofitable to measure the lag 
of One series as compared with others in finer time units 

1 Bowley, Elements of 



than two months. Consequently, coefficients of corre- 
lation have been computed from averages of items for 
two-month periods, January and February, March and 
April, May and June, July and August, September and 
October, November and December. 

The method followed, then, of ascertaining the 
degree and nature of the correlation between various 
pairs of series is as follows: 

(a) The graphs of cycles of business statistics were 
charted on tracing paper and then compared 
by placing one over the other on the glass top of 
an illuminated box. This preliminary compari- 
son was made by three independent observers. 
Each observer recorded, first, the degree (high, 
moderate, low) and sign (+ or — ) of the corre- 
lation for the closest fit of the two graphs; 
second, the amount of lag in months for the 
maximum correlation; and third, the consist- 
ency or systematic character of the lag. The 
results thus secured were used as a guide in 
further studies of correlation. 
(6) Coefficients of correlation were computed be- 
tween those pairings of items which, from the 
preliminary comparison, appeared likely to re- 
sult in maximum correlation. In order to avoid 
unnecessary computation the monthly items 
were combined to make bimonthly items. 
The lag, as shown by the highest coefficient 
of correlation, was thus measured to the nearest 
two months. The degree of correlation was not 
considered noteworthy unless r was upwards of 

+.50. 
Having adopted a method of measuring correlation 
between the cycles of any two series, the next steps are 
to classify all the series with respect to correlation and 
lag, to decide what series constitute good indices of 
business conditions, and to construct from them one or 
more synthetic indices. 

3. The Groups of Series : Construction of Indices 
of General Business Conditions 

Comparison of the cycle chart of each of twenty 
series with the cycle charts of the remaining nineteen, 
as described in the preceding section, led to certain 
important conclusions. These conclusions are as fol- 
lows: 

The judgments of the three independent observers 
concerning the sign and degree of correlation, and 
approximate lag for best " fit " of the curves were, in 
the main, consistent. There were differences of opinion 
in several cases, however, as to the exact lag for maxi- 
mum correlation. In a few cases there was uncertainty 
as to whether the best fit of the two cycle charts was 
positive for a lag in one direction or negative for a lag 
in the other direction. Furthermore, there was not 
always unanimity of opinion concerning the degree of 
correlation even in the rough classification of " high, 

Statistics, 2d ed., p. 320. 
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moderate, or low." Sometimes, for instance, one ob- 
server would decide that the correlation between two 
graphs was " high " and the other observers would call 
it " moderate " or " moderate to high." In any case, 
the three-term classification (high, moderate, low) 
could only indicate the degree of correlation very 
roughly. Consequently, it was impossible, on the basis 
of the graphic comparison alone, to arrange or " array " 
the pairs of series compared in order of the magnitude 
of the correlation between the various pairs. The 
graphic comparison could be utilized, therefore, merely 
as a guide for further investigation of the questions at 
issue. In all cases it was clear that correlation coeffi- 
cients must be computed before a final decision could 
be reached. In other words, decisions concerning the 
nature and degree of correlation between pairs of series 
were actually based upon graphic comparison as a first 
approximation and the coefficients of correlation as a 
second approximation. 1 

Comparison of the series over the illuminated box 
was an essential part of the procedure. It showed that 
the cycles of the several series are related but the wave 
movements are neither synchronous nor always posi- 
tively correlated; that there are various degrees of 
correlation and lag; that the waves in the charts do not 
occur simultaneously but consecutively; that for some 
pairs the lag is approximately constant, for other pairs 
variable. The graphic comparison supplemented by 
coefficients of correlation makes it possible to arrange 
the series in order corresponding to the time of occur- 
rence of the crests and troughs of the waves. 

The systematic movements of the series, with refer- 
ence to each other, the nature of the correlation between 
pairs of series, and the length and direction of the lag 
were definitely interrupted with the outbreak of the 
war in August 1914. It may be said that, in general, 
the order of the series as to lag was completely reversed. 
That is to say, series such as commodity prices and 
interest rate, which usually rise and fall four to eight 
months after bank clearings and pig-iron production, 
were the first series to feel the effects of the economic 
cataclysm caused by the declaration of war. The value 
of building permits, which usually forecasts fluctuations 
in commodity prices and pig-iron production, lagged 
behind these series in the upward movement of 1915-16. 
Another interference with the orderly movement of the 
series came in the spring of 191 7 when the United States 
entered the war. After that time the value of building 
permits ceased to be an index of current or prospective 
industrial activity. As we have seen, the slackening of 
production in the winter of 1917-18, resulting from 
transportation congestion and the fuel situation, is 
revealed by the marked temporary fall in bank clear- 
ings and pig-iron production. In short, the disturbance 
due to the war completely upset the orderly routine of 

1 A table giving the reconciled judgments of three observers is 
given on pp. 184-187 following. All the coefficients of correlation com- 
puted are given on pp. 182, 183. 

2 The reader is referred to Part I, Section 4, for a discussion of the 
statistical effects of the transition from peace to war and war to peace. 



peace times. 2 In consequence of these considerations, 
comparison of the various series with each other for 
correlation and consistency of fluctuations is limited in 
the present issue of the Review to the period ending 
July 1 91 4. 

The conclusions, just stated, bear directly on the 
question which we now propose to discuss. Is it legiti- 
mate to combine several series of business statistics into 
a synthetic index of business conditions, the fluctuations 
of which will measure, in normal peace times, more 
accurately than a single series, the intensity and ramifi- 
cation of industrial activity ? In short, is it legitimate, 
for our purpose, to average several series ? 

It has been customary for commercial statistical 
agencies to average a number of statistical series to 
secure a " business barometer " or a " compositplot." 
For instance, one agency uses the following combina- 
tions: 

1. A graph to show the price trend of thirty-two 
leading stocks. 

2. A business graph based upon the statistics of 
bank clearings, railroad earnings, pig-iron pro- 
duction and prices, commodity prices, imports, 
building, and immigration. 

3. A banking graph based upon reserves, deposits, 
the rate of commercial paper, the percentage of 
loans to deposits, and the percentage of reserves 
to loans. 

Another agency combines the following groups of 
series and presents a single graph: 

1. A group indicating mercantile conditions, based 
on statistics of immigration, new building, com- 
mercial failures, and bank clearings. 

2. A group indicating monetary conditions, based 
on commodity prices, foreign trade, foreign 
money rates, and domestic money rates. 

3. A group indicating investment conditions, based 
on conditions of leading crops, railroad earn- 
ings, political factors (estimated), and prices of 
stocks. 

In neither case is any justification offered for the 
choice of series to be combined; in neither case is suffi- 
cient information given to enable one desiring so to do, 
to check up the computation. All that can be said is 
that available series are averaged by some process or 
other. 

Inspection of the lists of series combined by the com- 
mercial agencies to secure " business barometers " 
reveals the fact that a most miscellaneous selection of 
series is averaged. No criterion of homogeneity of 
series is mentioned; apparently none is even conceived 
to be necessary. Our preliminary survey indicates that 
the series entering into the business barometer have 
dissimilarities no less marked than their similarities. 
Even though all possess wave movements, such move- 
ments differ in amplitude, intensity, and time of ebb 
and flow. To combine such series, it would appear, is to 
blur the picture and conceal its meaning, not to make 
its outline distinct and its message unmistakable. 
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Clearly, the first problem encountered in constructing 
a synthetic index of business conditions is to sort the 
various series according to (a) correlation, and (b) lag. 
In the preceding section we have described the method 
adopted of comparing series. Comparison of the cycles 
of the various series according to that method led to 
the conclusion that wave movements occur in the fol- 
lowing order: 

I. 12. Rate (yield) on ten railroad bonds. 
20. Price of twenty railroad stocks. 
18. Price of industrial stocks. 

II. 6. Building permits. 

1. New York clearings (standard for group). 

8. Shares traded. 

III. 2. Production of pig iron (standard for group). 

3. Outside clearings. 

5. Imports. 

9. Unfilled orders U. S. S. C. 
11. Business failures. 

FV. 4. Bradstreet's prices (standard for all series). 

16. Bureau of Labor prices. 

7. Railroad gross earnings. 

23. Reserves of New York banks. 

V. 17. Dividend payments. 

22. Loans of New York banks. 

24. Deposits of New York banks. 

13. Rate on four-to-six-months paper. 

14. Rate on sixty-to-ninety day paper. 

It is probable that three questions come to the mind 
of the reader as he examines the list of series just given. 
The questions are: What is the reason for grouping the 
series in this peculiar fashion ? What is the criterion 
used for the arrangement of the individual series and 
groups of series, first to last ? What is the meaning of 
the phrases " standard for group " and " standard for 
all series " ? These questions will now be considered. 

The series of each group (I, II, III, IV, V) have 
cycles which move concurrently with the cycles of other 
series of the same group. That is, the maximum corre- 
lation for series of one group occurs when items for the 
same date are paired; moreover, the coefficient of cor- 
relation between any two series of a given group is 
significant. Thus, the coefficients of correlation among 
the series of Group II vary from +.52 to +.84; those 
of Group III vary from —.57 to +.82; and those of 
Group IV vary from —.72 to +.77. Table G (p. 183) 
gives the intra-group coefficients of correlation for con- 
current items and for two months lag for each series 
as compared with the other, showing that the maxima 
occur for concurrent items. 1 

What significance, if any, has the order of the groups 
in our fist ? The series of Group II lag behind the series 
of Group I by two to four months; those of Group III 
lag behind those of Group II by two to four months; 
those of Group IV lag behind those of Group III by 
two to four months; while those of Group V lag behind 
those of Group IV by four to six months. Table H 



(p. 183) gives the inter-group coefficients of correlation 
for various pairings of selected series. 

The series labeled "standard for group" were used as 
representative series of each group in the computation 
of intra-group and inter-group coefficients of correla- 
tion. Bradstreet's prices were selected as the standard 
with which to correlate all series. This selection of 
standard series was made, mainly on the evidence 
offered by the cycle charts, in order to decrease the 
number of coefficients of correlation computed; it was 
thought unnecessary to compute the coefficients for 
every possible pair of the twenty series examined. 2 It is 
obviously desirable to select a standard with which to 
correlate various series. Three such standards were 
chosen — bank clearings of New York City, tonnage 
of pig iron produced, and Bradstreet's index of com- 
modity prices. One reason for this selection is that the 
cycles of these three series do not occur simultaneously. 
New York bank clearings were selected because the 
items fluctuate directly with the amount of credit in- 
struments originating with loans and speculations as 
well as the physical volume and prices of commodities 
bought and sold. Tonnage of pig iron was selected 
because it appears to be the best single index of physical 
productivity. Bradstreet's index of commodity prices 
was selected because it is a composite of ninety-six 
wholesale prices of important raw materials and manu- 
factured goods. 3 

The concensus of opinion of writers on business cycles 
is that such cycles are preeminently characterized by 
price movements, culminating in crises and reaching 
bottom in times of depression. 4 Consequently, coeffi- 
cients of correlation were computed for each of three 
pairings of items with Bradstreet's prices. The pairings 
made were as follows : 

1. Pairs which appeared to have the maximum 
correlation. 

2. Pairs obtained by shifting cycles two months to 
right. 

3. Pairs obtained by shifting cycles two months to 
left. 

The coefficients of correlation for (a) Bradstreet's 
prices as the standard and (b) nineteen other series as 
correlatives are given in Table F (p. 182). These co- 
efficients, together with those of Tables G and H, give 
the requisite information for arraying the various series 
in order of the time of their wave movements and for 
grouping the series. The series of a given group may, 
legitimately, be averaged. We have computed five 

1 The number of groups is, of course, somewhat arbitrary. Like- 
wise, there are " border-line " series which might be assigned to either 
of two adjacent groups. 

2 There are 190 possible pairs of series among the twenty. If three 
coefficients are computed for each pair 570 coefficients would be found. 
Since there are 138 items in each series a total of nearly 80,000 prod- 
ucts would be required, a considerable task to compute, record, and 
add. 

3 See paragraph on source and nature of data on p. 139 for a de- 
tailed description of this series. It is the plan of the Review to make 
an intensive and comparative analysis of this and other price indices 
in a later number. 

4 See Memoranda, p. 206 for the r61e of prices in the business cycle. 
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synthetic indices of business conditions by averaging 
the concurrent cycles of the series composing each of the 
five groups. These synthetic indices, presented graph- 
ically, picture the ebb and flow of various phenom- 
ena connected with the business cycle. The crests or 
troughs of the waves do not occur simultaneously, but 
successively over a period of twelve to sixteen months. 
The synthetic indices interpreted with reference to 
each other and to the series composing each group con- 
stitute an index of general business conditions. 

In the preceding sections of Part II we have explained 
the technique adopted for correcting individual series 
of business statistics and for measuring the correlation 
between series. As a measure of correlation we have 
used the coefficient of the English school of biometri- 
cians. Questions arise in connection with our use of the 
coefficient to measure correlation between time series 
of economic statistics. The coefficient of correlation 
was devised for another purpose. Is the use of it that 
has here been made a legitimate one ? Furthermore, 
what, exactly, is the concept of correlation ? What is 
the relation between correlation and causation ? What 
is the nature of " economic law ? " These questions, 
general in their nature, will be considered in the follow- 
ing section. That section may be safely omitted by 
readers who are interested in the results of statistical 
processes rather than in an extended consideration of 
the concepts at the basis of our statistical technique. 

4. General Considerations Relating to the 
Measurement of Correlation 

Perhaps an accurate concept of correlation and the 
coefficient of correlation may best be secured by examin- 
ing briefly the development of the theory. The theory 
of correlation was originally developed by English 
scientists in connection with their studies of biological 
evolution. Sir Francis Galton deserves the credit for 
conceiving of a numerical measure of the intensity of 
correlation and for devising a practical statistical 

1 See Yule, " The Applications of the Method of Correlation to 
Social and Economic Statistics," in the Journal of the Royal Statistical 
Society, December, 1909, pp. 721-729, for a helpful summary of the 
history of the subject. This paper contains a selected list of thirty- 
nine memoirs on correlation arranged in chronological order under 
years of publication. The publications in which Sir Francis Galton 
first considered the problem of correlation with reference to the statis- 
tics of heredity are: " Family Likeness in Stature/' Proceedings of the 
Royal Society, London, 40, p. 42 (1886), and appendix to the preceding 
paper by J. D. H. Dickson, ibid., p. 63; " Correlations and Their 
Measurement, etc.", Proceedings of the Royal Society, London, 45, 
p. 135 (1888); and Natural Inheritance, (1889). 

2 A. Bravais, " Analyse mathematique sur les probabilites des 
erreuor de situation d'un point," Academie des Sciences. Memoirs 
■presents par divers savante. II e Serie, T. 9, p. 255 (1846). 

3 Yule, Journal of the Royal Statistical Society, December, 1909, p. 
722. 

4 In reviewing the second edition of Czuber's Wahrscheinlichkeits- 
rechnung, in the Journal of the Royal Statistical Society, May, 1911, pp. 
643-647, Mr. J. M. Keynes says: " This must long remain the stand- 
ard treatise on the topics with which it deals. There is no work in 
English which covers the same ground, and it greatly excels in grasp 
and thoroughness the French treatises which challenge comparison 
with it. . . . But of the modern theory of correlation and of the 
central position which this now holds in English statistical theory, 



method to secure such a measure. 1 He has concisely 
stated his point of view in the sentence which has been 
adopted as the motto of the Francis Galton Eugenics 
Laboratory: " Until the phenomena of any branch of 
knowledge have been submitted to measurement and 
number it cannot assume the status and dignity of a 
Science." 

Sir Francis Galton, Karl Pearson, G. Udney Yule, 
and F. Y. Edgeworth deserve credit for developing the 
mathematical theory of correlation and for applying 
that theory. It should be said, however, that " the 
method of correlation is only an application to the 
purposes of statistical investigation of the well-known 
method of least squares. It is impossible, therefore, 
entirely to separate the special literature of the theory 
of correlation from that of the theory of error or of the 
method of least squares. Of the numerous memoirs on 
the theory of error the most important in the present 
connection is that of A. Bravais, 2 who as long ago as 
1846 discussed the theory of error for points in space, 
regarding the errors as either independent or corre- 
lated, from the standpoint of the normal law of errors. 
He did not, however, use a single symbol for a corre- 
lation coefficient, although the product-sum formula 
may be regarded as due to him, in a sense — i. e., the 
product sum is introduced into his expression for the 
frequency of various combinations of error." 3 Bravais, 
obviously, was interested in the theory of correlation as 
a branch of pure mathematics, while Galton and the 
English school were primarily interested in the applica- 
tion of the theory. 4 

It was Karl Pearson and those working with him in 
the Biometric Laboratory who are chiefly responsible 
for insisting that, in our studies of inheritance, " we 
must proceed by the method of statistics, rather than 
by the consideration of typical cases." 5 

In view of the fact that the theory of correlation was 
developed as an aid in the study of evolution it is natural 
that correlation should first have been denned in biologi- 
cal terms. Thus Pearson says: "Two organs in the 

there is no hint. . . . Professor Czuber's methods are in direct line 
of descent from those of the classical writers on Probability and Error, 
and they possess the style and lucidity which such a history naturally 
gives them. But the reader must feel that these methods have reached 
their limit of accomplishment, and that nothing very novel can result 
from attempts to perfect them further. Recent English contributions, 
on the other hand, fragmentary and often obscure or inaccurate 
though they now are, seem to have within them the seeds of further 
development and to carry the methods of mathematical statistics into 
new fields." The French have done very little in the field of correla- 
tion. For instance, the second edition of Poincar6's Calcul des Prob- 
abilites (1912) is characterized by Keynes as follows: "This book 
has no reference to any of the researches, either German or English, 
which seek by the union of Probability and Statistics to forge a new 
weapon of scientific investigation." (Journal of the Royal Statistical 
Society, December, 191 2, p. 114.) The Italians and Russians have 
made some contributions to or applications of the theories of correla- 
tion developed by the English school. 

6 Pearson, " Mathematical Contributions to the Theory of Evolu- 
tion, III. Regression, Heredity, and Panmixia," Philosophical Trans- 
actions of the Royal Society of London, A. 187, p. 255 (1896). See 
numerous articles in Biometrika for illustrations of the methods used. 
Recent writers have pointed out that the consideration of typical 
cases, as well as the method of statistics, gives valuable results. See 
Raymond Pearl, Modes of Research in Genetics (1915). 
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same individual, or in a connected pair of individuals, 
are said to be correlated, when a series of the first organ 
of a definite size being selected, the mean of the sizes of 
the corresponding second organs is found to be a func- 
tion of the size of the selected first organ. If the mean 
is independent of this size, the organs are said to be 
non-correlated. Correlation is defined mathematically 
by any constant, or series of constants, which determine 
the above function. 

" The word ' organ ' in the above definitions of varia- 
tion and correlation must be understood to cover any 
measurable characteristic of an organism, and the word 
' size ' its quantitative value." 1 Pearson adds that 
" these organs may be in the same or in different indi- 
viduals, or partly belong to one and partly to another 
individual. The complex may be constituted by a 
natural or artificial tie of any kind, but the tie is to 
remain the same for every complex, whether it be the 
result of mating or parentage, or from any physiological 
or social relation, etc." 2 

To illustrate the concepts just stated, suppose that 
we are attempting to answer the question, do tall fathers 
have tall sons ? In this case, stature is the " measur- 
able characteristic " in each of " a connected pair of 
individuals." Suppose the average stature of all adult 
males is sixty-six inches; suppose we select several 
thousand fathers whose stature is seventy-two inches 
or more, six inches above the average for all, and find 
the mean stature of the sons of this group of tall fathers 
to be sixty-nine inches, three inches above the average 
stature of all adult males. If similar results appear 
consistently for selected fathers and their sons, we may 
conclude that the stature of sons depends upon the 
stature of fathers; or, in other words, the stature of 
sons is a function of the statures of fathers; or, in still 
other words, the statures of fathers and sons are corre- 
lated. We may be able to state a " law " of the in- 
heritance of stature, or give the stature of sons as a 
function of the stature of fathers. It is clear, however, 
that although tall fathers may, in general, have tall sons, 
an individual tall father may have a short son, or per- 
haps several sons, some tall, some short. That is, two 
concepts are involved; first, the law or function or 
equation expressing the relation on the average, existing 
between the two variables involved, and second, the 
degree with which individual cases adhere to the law. 

To illustrate the first concept, it may be possible to 
say that for an average deviation in stature of fathers 
of n inches from the mean for adult males, the stature 
of the sons of those fathers will deviate in the same 

direction by - inches. This is a law or function, the 

first concept that we have named. 3 But the statement 
of the function does not describe the situation com- 
pletely. How accurately does the function describe the 
situation; how systematic is the relationship between 
statures of fathers and sons; are the exceptional cases 
few or many ? These are different forms of a question 
which requires a quantitative answer. Such an answer 



is given by the coefficient of correlation. The coefficient 
is unity if there is no exception to the law of statures; it 
is zero if the statures of father and son are independent 
of each other; it is negative if tall fathers, in general, 
have short sons; it has a numerical value varying in- 
versely with the degree of divergence (both in number 
of cases and magnitude) of the individual cases from a 
linear relationship. 

It may be well to illustrate further the concepts just 
described. For instance, the compound interest law 
expresses the relationship between two variables, time 
and the amount of a given principal accumulated at 
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a given rate of interest. The accompanying diagram 
shows the amount (y) of $100 at the end of each year 
(x) , compounded at 4 per cent per annum. 4 Now if $100 
be kept continuously invested at 4 per cent per annum, 
and if there are no losses either of principal or interest, 
and if the accumulated interest is immediately rein- 
vested at 4 per cent per annum, then, with these assump- 
tions, the amounts shown by the curve will result at the 
times indicated. The assumptions named, however, are 
not apt to be fulfilled for long periods in the business 
world. The assumptions are artificial. Assume that 

1 Pearson, ibid., p. 257. See also the historical summary on p. 261. 

2 Ibid., pp. 261-262. 

3 The law usually selected is the simplest, that is, the question 
usually put is this: On the average, does an increase or decrease of a 
given number of units in one series correspond to an increase or de- 
crease (or inversely) of some constant number of units in the other 
series ? This law is the one implied when the coefficient of correlation 
is used in the ordinary fashion. Artifices may be utilized for testing 
the adherence of measurements to more complicated laws. See Yule, 
Theory of Statistics, 1st ed., p. 202. 

4 The algebraic expression of the law is y = 100 (i.04) x . 
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thousands of investors having $100 each, deposit their 
principal in various savings banks paying, at the start, 
4 per cent; assume further that the accrued interest be 
deposited. There will be some total losses, some partial 
losses, and numerous changes of interest rates. Conse- 
quently, the actual accumulations of the depositors will 
be widely scattered. Possibly the points determined 
by the actual cases will not be grouped about the 4 per 
cent compound interest curve. Some other law may ex- 
press more accurately the relationship between capital 
accumulated and number of years elapsed. The law, 
whatever it may be, is one concept, a statement of the 
accuracy or universality of the law is a correlative con- 
cept. The law may be an inaccurate one. Indeed, there 
may be such a low degree of correlation between the two 
variables, capital accumulated and number of years 
elapsed, that the statement of any law is futile. 

Let us now return to the consideration of other con- 
cepts of the English school. In deriving the expression 
for the coefficient of correlation Pearson makes certain 
assumptions concerning the causes determining the 
sizes of the organs correlated. These assumptions are: 
first, " that the sizes of this complex of organs are de- 
termined by a great variety of independent contribu- 
tory causes, for example, magnitudes of other organs 
not in the complex, variations in environment, climate, 
nourishment, physical training, and innumerable other 
causes, which cannot be individually observed or their 
effects measured; " second, " that the variations in 
intensity of the contributory causes are small as com- 
pared with their absolute intensity, and that these va- 
riations follow the normal law of distribution." l On 
the basis of these assumptions Pearson has devised the 
expression for r, the coefficient of correlation, which we 
have previously given (see p. 123). 

It is clear that the statistical series of Galton and 
Pearson are unlike the time series of business statistics 
with which we are concerned at present. To be sure, 
the fundamental assumptions concerning the causes at 
work hold for economic as well as for biological data. 
The causes determining the monthly production of pig 
iron, for instance, or the interest rate, are very numer- 
ous. It is not easy, however, to establish the independ- 
ence of these causes. Thus, the pig-iron production will 
be greater or less because of numerous causes, such as 
the labor conditions, the efficiency of foremen, varia- 
tions of the railroad situation, the weather, the nature 
of the raw material, accidents of various kinds. With a 
given demand for pig iron at a given price there will be 
variation in production. But higher prices, or improved 
processes, may greatly stimulate production in old fur- 
naces and cause the construction of new ones. A period 
of prosperity, perhaps, may be analogous either to a 
set of dependent contributory causes or to a cause with 
great variations in intensity. 

Fortunately, in order to justify the use of the coeffi- 
cient of correlation, it is not necessary to demonstrate 
that the highly refined assumptions used in the mathe- 

1 Pearson, 



matical analysis hold in the world of industry. The 
justification for the use of the coefficient of correla- 
tion in economics depends, rather, upon the empirical 
considerations given in the preceding section (Part II). 
There are certain important differences, however, be- 
tween series of measurements of organs and indices of 
business activity that demand consideration. 

The essential differences with reference to the problem 
of correlation between measurements of organs and the 
items of time series are the following: 

1. The items of time series must be defined for a 
selected time unit; they measure one form of activity 
of industrial society, whereas biological measurements 
are individual observations of objects whose measure- 
ments are independent of time. 

2. Because they are defined for units of time, the 
items of time series are ordered in time; each item has a 
definite position with respect to the other items. The 
way to compare the items of two such series is to set up 
a relationship in time; that is, to pair concurrent items, 
or those definitely related in time. When we are corre- 
lating pig-iron production and interest rate, for instance, 
we wish to know not only the degree of correlation for 
one set of pairs, but the degree of correlation for various 
sets. There are various options possible in pairing 
items. On the other hand, when we are considering the 
correlation between measurements of organs, such as 
the stature and weight of the same individuals, or 
statures of fathers and sons, there is a unique pairing of 
items determined by the problem. The questions, does 
weight vary directly with stature, and do tall fathers 
have tall sons, obviously cannot be answered by pairing 
the stature of one individual with the weight of another 
or the stature of a father with the stature of his neigh- 
bor's son. The concept of lag is thus peculiar to time 
series. 

3. Because they are ordered in time, the adjacent 
items in time series may be determined by overlapping 
or persistent causes; a succession of items of similar 
sizes is the rule rather than the exception. Adjacent 
measurements of organs, on the other hand, are not so 
connected. Expressed in other words, the items of time 
series, even though corrected for secular trend and 
seasonal variation, are apt to show a high degree of 
correlation when paired with succeeding items of the 
same series, whereas the items of biological series show 
only a nominal degree of correlation when so paired. 
Thus, the coefficient of correlation for the cycles of pig- 
iron production of one month and that of two months 
following is +.84. 

In the preceding paragraph " overlapping causes " 
were mentioned. The manifestation of such causes, 
it was stated, is similarity between contiguous items 
of a series. Since we have ehminated secular trend 
and seasonal variation from each series to secure the 
" cycles," we have, presumably, ehminated the effect 
of two classes of overlapping causes before we correlate 
the cycles of any two series. The coefficient of correla- 

op. ciL, p. 262. 
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tion for two series of economic statistics will frequently 
be larger than it otherwise would be, however, because 
of the influence of the same set of overlapping causes 
operating upon both series during the business cycle. 
It does not appear legitimate, therefore, to compare 
coefficients of correlation obtained from time series of 
economic data with those obtained from biological 
measurements. Each set of coefficients measures a 
type of correlation peculiar to the data utilized. A 
coefficient of correlation computed from economic data 
is not significant merely because of its absolute size; it 
is significant because of its size as compared with the 
size of other coefficients computed for the same two 
series but for other pairings of the items. 

It is because of the numerous sets of influences oper- 
ating upon the items of time series as time goes on that 
such items must be corrected if we would isolate the 
effects of designated influences. In the present study 
we have corrected the items for secular trend and sea- 
sonal variation, and have found the correlation between 
the cycles. The correlation coefficient, thus found, 
measures the degree of similarity between the wave 
movements of two series. Comparison of coefficients 
computed for various pairings of items enables us to 
determine the lag for maximum correlation. A method 
of measuring another type of correlation between two 
time series has been suggested by R. H. Hooker. 1 The 
method " consists simply in calculating the correlation 
coefficients of the differences between successive values 
of the two variables, instead of the differences from the 
arithmetic means. . . . Correlation of the deviations 
from an instantaneous average (or trend) may be 
adopted to test the similarity of more or less marked 
periodic influences. Correlation of the difference be- 
tween successive values will probably prove most useful 
in cases where the similarity of the shorter rapid 
changes (with no apparent periodicity) are the subject 
of investigation, or where the normal level of one or 
both series of observations does not remain constant." 2 

Computation of the correlation coefficients of the 
differences between successive monthly or quarterly 
cycles would undoubtedly give us reliable measures 
where the lag is invariable. In case, however, the lag 
of the correlative series varies, say from three to six 
months, but nevertheless there is always a lag, the coeffi- 
cient computed for first differences will not indicate the 
fact. In other words, a low coefficient may occur when 
the cyclical fluctuations are similar but are not invari- 
ably separated by the same time interval. This is pre- 
cisely the situation before us in the present study. The 
method of correlation adopted, therefore, as most satis- 
factory is that of comparison of the cycle charts over 
the illuminated box, the conclusions being checked by 
the computation of coefficients of correlation between 
the cycles. 3 

1 " On the Correlation of Successive Observations," in the Journal 
of the Royal Statistical Society, December, 1905, pp. 696-703. 

2 Ibid., pp. 697, 703. 

3 The variate difference correlation method does not yield reliable 
results for series such as we are dealing with in this paper. See " The 



What is the significance of a conclusion (based upon 
inspection of the curves and consideration of the coeffi- 
cients of correlation) that the graphs of the cycles of 
two series of business statistics are highly similar ? If 
the cycles of one series systematically precede the cycles 
of another series does it mean that the movements of 
the first cause the movements of the second ? If so, 
how much similarity between the two series is necessary 
for us to term the first series " the cause " and the 
second series " the effect " ? Or, are both series to be 
considered effects of a complex of joint causes ? If so, 
are the dissimilarities due to differences in the constitu- 
tion of the complex of joint causes determining each, to 
the varying intensity of like joint causes, or to both ? 
Should we think of the various series as being symptoms 
of underlying economic conditions and of our problem 
as the sorting, analyzing, and judging of symptoms in 
order to secure an accurate description of those condi- 
tions ? More specifically, what is the meaning of the 
similarity between the cycles of pig-iron production 
and the interest rate on sixty-to-ninety day commer- 
cial paper, the latter having a maximum correlation 
coefficient of +.75 for a six months lag ? 

In the present study we have made comparison of one 
series with another in order to secure a concise state- 
ment of how changes are taking place. The rationale 
we have used differs in no way from that of any other 
scientific investigation. The modern scientific point of 
view has been effectively stated by Karl Pearson. We 
cannot do better than quote his arguments. 

" That a certain sequence has occurred and recurred 
in the past is a matter of experience to which we give 
expression in the concept causation; that it will con- 
tinue to recur in the future is a matter of belief to which 
we give expression in the concept probability. Science 
in no case can demonstrate any inherent necessity in a 
sequence, nor prove with absolute certainty that it 
must be repeated. Science for the past is a description, 
for the future a belief. 4 ..." 

" In the struggle for existence man has won his dic- 
tatorship over other forms of life by his power of fore- 
seeing the effects which flow from antecedent causes — 
not only by his memory of past experience, but by his 
power of codifying natural law, that is, by his power of 
generalising experience in scientific statements. 5 . . ." 

" Anything is a cause which antedates or accompanies 
a phenomenon, and we ask if we vary that cause to what 
degree we vary or change the phenomenon. If we say 
the variation of the cause produces no effect on the 
phenomenon we have absolute independence; if we 
found variation of this cause absolutely and alone 
varied the phenomenon we should say that there was 
absolute dependence. Such absolute dependence of a 

Variate Difference Correlation Method and Curve-Fitting " by 
Warren M. Persons, Quarterly Publications of the American Statistical 
Association, June, 1917. 

4 Karl Pearson, Grammar of Science, 3d ed., Pt. I, p. 113 (191 1). 

6 Ibid., p. 137. 
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phenomenon on a single measurable cause is certainly 
the exception, if it ever exists when the refinement of 
observation is intense enough. . . . But between these 
two limits of absolute independence and absolute de- 
pendence all grades of association may occur. When 
we vary the cause, the phenomenon changes, but not 
always to the same extent; it changes, but has variation 
in its change. The less the variation in that change the 
more nearly the cause defines the phenomena, the more 
closely we assert the association or the correlation to 
be. It is this conception of correlation between two 
occurrences embracing all relationship from absolute 
independence to complete dependence, which is the 
wider category by which we have to replace the old idea 
of causation. Everything in the universe occurs but 
once, there is no absolute sameness of repetition. In- 
dividual phenomena can only be classified, and our 
problem turns on how far a group or class of like, but 
not absolutely same, things which we term ' causes ' 
will be accompanied or followed by another group or 
class of like, but not absolutely same things which we 
term ' effects '- 1 . . ." 

" All classes of phenomena are linked together, and 
the problem in each case is how close is the degree of 
association. Likeness of causes produces likeness of 
effects; we can measure the degree of likeness, whether 
we are dealing with a chemical reaction or with the 
resemblance in any aptitude between parent and child. 
There is no question of absolute sameness in either case; 
there is a wide degree of difference in the likeness. 2 . . ." 

" There is always in non-organic as in organic phe- 
nomena a residual variation. Repetitions are like 
within limits, but not same, for the antecedents are only 
like but never same. From this standpoint the universe 
appears as a universe of variation rather than as a uni- 
verse controlled by the law of causation in its narrowest 
sense. No phenomena are causal; all phenomena are 
contingent, and the problem before us is to measure the 
degree of this contingency, which we have seen lies 
between the zero of independence and the unity of 
causation. That is briefly the wider outlook we must 
now take of the universe as we experience it." 3 

From this standpoint there is " no distinction in kind 
but only in degree between the data, method of treat- 
ment, or the resulting ' laws ' of chemical, physical, 
biological, or sociological investigations. They all pro- 
vide, or should provide, (i.) a conceptual routine, which 
is a functional expression of average experience, and 

1 Karl Pearson, Grammar of Science, 3d ed., Pt. I, p. 113 (ion), 

P- 157- 

2 Ibid., p. 166. l Ibid., p. 173. 

3 Ibid., p. 174. 6 Ibid., p. 113. 

6 " The probable error is merely a quantity such that we may 
expect greater and less errors of simple sampling with about equal 
frequency, provided always that the distribution of errors is normal." 
Yule, Theory of Statistics, 1st ed., p. 307. 

7 Some writers on economic statistics have confused the notions of 
correlation (closeness with which the actual corresponding fluctuations 
obey any law which we may select) and of the law which we select for 
the test. For instance, see the Note by James D. Magee and Re- 
joinder by Warren M. Persons in the Quarterly Publications of the 
American Statistical Association, December, 1914, pp. 345-350. In an 
article on " The Correlation of Historical Economic Variables and 



(ii.) a measure of the possible or probable deviations 
from this routine, which is a guide to the amount of 
variation in experience. 4 ..." This variation is small 
in physical experiences and large in economic experi- 
ences. In all types of experiences, however, our prob- 
lem is to discover system in the midst of variation. Our 
problem in dealing with experience is not solved when 
we find a " functional expression of average experi- 
ence"; we must also measure "the amount of vari- 
ation in experience." Our problem does not even end 
here. The object of summarizing experience is to 
use that experience as a basis of future conduct or 
action. " Science cannot demonstrate that a cataclysm 
will not engulf the universe tomorrow, but it can prove 
that past experience, so far from providing a shred of 
evidence in favour of any such occurrence, does, even 
in the light of our ignorance of any necessity in the 
sequence of our perceptions, give an overwhelming 
probability against such a cataclysm." 6 The probabil- 
ity that a discovered sequence will be repeated in the 
future depends, obviously, upon the breadth of the 
sample or the number of times the sequence has been 
observed and upon the variability of the sequence. 
Consequently, (i.) a " functional expression of average 
experience," (ii.) " the amount of variation in experi- 
ence," and (iii.) "the probable error of experience" 6 are 
all necessary in utilizing experience as a basis of action. 7 

In illustration of the points just made let us consider 
once again the relationship between the cycles of pig- 
iron production and the interest rate on sixty-to-ninety 
day commercial paper. Using the data for the period 
January 1903- July 1914, we find that pig-iron produc- 
tion varies directly with the interest rate six months 
later. The linear functional expression for interest rate 
(y) in terms of pig-iron production (x) is 

y = +.7580; +.061. 

The coefficient of correlation of +.75, being less than 
unity, expresses the fact that there is " variation in 
experience." The probable error of ±.03 is a measure 
of the breadth of the sample utilized for the coefficient 
just given. But are there not also " probable errors," 
so to speak, in the lag and in the type of mathematical 
function used ? There are. The lag or interval varies 
from three to nine months; an interval of six months 
merely seems to be more persistent during the period 
studied than other intervals that may be selected. 

the Misus^of Coefficients in this Connection " in the Quarterly Publi- 
cations of the American Statistical Association, December, 1017, pp. 
847-853, Dr. Willford I. King has concepts of economic law, causa- 
tion, and correlation quite different from those defined by Karl 
Pearson and accepted by the present writer. Dr. King says, " A law 
which only works part of the time is no law at all. . . . There can be 
no such thing as imperfect or partial correlation. Every correlation 
to exist must be perfect. Every cause must produce its effect exactly 
and invariably " (pp. 848, 850). Dr. E. E. Day has taken issue with 
Dr. King (see Quarterly Publications of the American Statistical Asso- 
ciation, June, 1918, pp. 115-118). The present writer agrees with Dr. 
Day that Dr. King's " discussion of the nature of correlation is so 
extreme and inflexible as to be wholly misleading " and that it " will 
add materially to the confusion regarding the nature of correlation and 
the correlation coefficient." Ibid., pp. 115, 188. 
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TABLE K. — CORRELATION TABLE OF CYCLES FOR (a) MONTHLY PRODUCTION OF PIG IRON, JANUARY 

1903-JANUARY 1914 AND (6) MONTHLY INTEREST RATE ON SLXTY-TO-NINETY 

DAY PAPER SLX MONTHS LATER, JULY 1903-JULY 1914 
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If the extent and nature of our data warranted the 
expenditure of further labor we might secure a more 
comprehensive description of the lag than one which 
merely is " six months on the average." In the present 
case, however, a more comprehensive description of the 
lag is not useful. Likewise, a more complicated func- 
tion might be found which would describe the relation- 
ship for the period studied more accurately than one of 
the first degree. But we are looking for simple relation- 
ships. The more complicated the relationships the less 
valuable the results. It may be said that for over 95 
per cent of economic series it is not worth while to 
search for a more complicated functional expression 
between the variables than one of the first degree. In 
order to make it entirely clear, however, that it is 
possible to use other than linear functions, a function of 
the second degree (a parabola) will be fitted to the 
paired items of pig-iron production and interest rate 
on sixty-to-ninety day paper. 



Our present problem, then, is to fit a parabola (as 
well as the straight line already found) to the data for 
the production of pig iron and interest rate on sixty-to- 
ninety day commercial paper as given in the correlation 
table (Table K). Chart H (p. 136) presents the results. 
In fitting the parabola to the data, pig-iron production 
(x) was considered the independent variable, and interest 
rate (y) the dependent variable. 1 In other words, if we 
assign a given value to pig-iron production and sub- 
stitute that value for x in the equation of the parabola, 
y = —.074 + .829a; + .147X 2 , the resulting value of y is 
our estimate of what the interest rate will be six months 
later. It must be remembered, however, that the corre- 
lation between pig-iron production and interest rate is 

1 The method of moments was used. The values of the constants 
in the equation y = a + bx + cx? were found by solving the follow r 
ing simultaneous equations: 

Sy = na + b2x + cZx 2 
Xxy = a'Zx + bZx 2 + c2x* 
2x?y = aSx 2 + ftSa? + cS* 4 . 
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not perfect, and hence that the same interest rate does 
not invariably follow a given pig-iron production. There 
is more or less scatter or variability. The value of y 
that we get for a given x is really an average rate of 
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Chart H. — Comparison of the results of fitting the two functions for 
(a) a straight line, and (6) a parabola, respectively, to the data 
presented in Table K. 

interest that corresponds to a given pig-iron production 
six months previous. In Chart H the solid circles are 
located by using as abscissas the mid-points (—2.4, 
— 2.1, . . . o . . . +1.2, +1.5) of the class intervals 
of pig-iron production and as ordinates the corre- 
sponding averages of interest rate ( — 1.00, —1.38, — .90, 
-1.05, -.50, -.76, -.46, -.70, -.08, +.30, +.48, 
+.94, +.98, +1.30). The areas of the solid circles are 
proportionate to the frequencies (3, 5, 2, 4, 3, 11, 15, 9, 
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Chart J. — Straight line fitted to interest rate as the inde- 
pendent variable and pig-iron production as the dependent 
variable (Data of Table K). 

8, 13, 22, 23, 12, 3). If all of these circles were located 
exactly upon the straight line or upon the parabola, our 
estimate of an average y for a given x, by means of the 
proper equation, would be exact. The precision of our 



estimate varies directly with the closeness with which 
the circles are grouped about the line or curve. A deci- 
sion as to the best function to use in any given case can 
usually be reached by simple inspection of the fit of the 
graphs to the points. 

Thus far in our discussion of fitting a straight line 
and a parabola to interest rate and pig-iron production, 
the former has been taken as the dependent variable, 
and the latter as the independent variable; so that by 
use of the equation secured, the former could be esti- 
mated from the latter. Cannot we reverse this process 
and take interest rate as the independent variable ? 
That is, cannot we estimate average production of pig 
iron from a given interest rate ? We can, but not by 
use of the same function. Chart J (above) shows the 
straight line which we obtain by taking interest rate 
as the independent variable and pig-iron production 



'Coordinates of points 
x- production of pig iron 
y- interest rate on OO - 90 
day paper (six months lag) 
Equations : 
jr independent y 758x+.06l 
y independent y°f,346x~t-,040 




Chart K. — Points located by pairing items of (a) production of pig 
iron and (6) interest rate on sixty-to-ninety day paper six months 
later, together with the two lines fitted to the data of Table D. 

as the dependent variable. In other words, the line 
is fitted to the solid circles whose ordinates are the 
mid-points (-1.5, —1.2, . . . o . . . +2.4, +2.7) of 
the class intervals of interest rate, whose abscissas 
are the corresponding averages of pig-iron production 
(-1.29, -1.40, -.76, -.40, o, +.02, +.30, +.75, 
+.86, +.79, +.66, +.90, +1.05, +.90, +1.20) and 
whose areas are proportionate to the frequencies (7, 9, 
17, 15, n, 13, 11, 10, 16, 11, 5, 4, 2, 1, 1). The fine in 
Chart J enables us to estimate average pig-iron produc- 
tion from a given interest rate. In like manner, instead 
of the straight line, a parabola or other curve might be 
fitted to these same points and be utilized for estimates. 
The two lines which have been fitted (considering, 
first pig-iron production, and then interest rate six 
months later, as the independent variable) to the 133 
points located by paired items are not identical. This 
lack of identity is due to the fact that the correlation 
between the series is not perfect. The more rigid the 
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correlation between the two series the closer will the 
two lines approach coincidence; whenr = ±1 the lines 
will coincide. The 133 points, together with the two 
lines, are shown in Chart K. 

To conclude this general discussion of correlation, it 
is necessary to consider one further point. In the 
derivation of the coefficient of correlation by Karl 



diagram, however, is not marked enough to invalidate 
the present application of the Pearsonian coefficient of 
correlation to economic data. 

Theoretically, the irregular fluctuations rather than 
the cyclical fluctuations are the ones which we should 
expect to be distributed according to the normal law of 
distribution of errors. In order to test this supposition, 
let us attempt to isolate the irregular fluctuations in a 
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Chart L. - 



• Frequency distributions of (a) pig-iron production, January 1903- January 1914, and (6) interest rate on sixty-to^ninety day paper, 
July 1003- July 1914 presented in the form of rectangular diagrams. 



Pearson and others, it was assumed that the items of 
each series were distributed according to the normal 
law. Biological measurements, such as stature, weight, 
and intellectual capacity, are distributed according to 
that law. But are economic measurements, such as 
production of pig iron, interest rate on sixty-to-ninety 
day paper, and value of building permits, distributed 
according to the same normal law ? Chart L (above) 
shows the frequency distributions of production of pig 
iron and interest rate on sixty-to-ninety day paper. 
For narrow class intervals, the rectangular diagram 
for the production of pig iron reveals a bimodal dis- 
tribution; for wider class intervals the distribution is 
approximately normal. The same type of distribution, 
though less marked, is revealed by the diagram for 
interest rate on sixty-to-ninety day paper; in this 
case the distribution is " skewed " or unsymmetrical. 
This divergence from normal distribution shown in each 



series wherein they appear to a marked degree, e. g., 
the value of building permits (Series 6). 

The process adopted of isolating the irregular fluc- 
tuations for the value of building permits, 1903-16, is 
as follows: 

(1) Monthly relatives were computed, each item be- 
ing expressed as a relative with the corresponding ordi- 
nate of the secular trend as the base, thus presumably 
eliminating the secular trend. 

Other relatives were then computed, by using the 
series described in the preceding paragraph as the 
numerators, and the appropriate indices of seasonal 
variation (the computation of which is described in 
Part III, Section 1 of this paper) as the denominators. 
This operation presumably eliminates, in addition to 
the secular trend, the seasonal variation, but it does 
not eliminate the cycles. 

(2) There were computed the percentages of the 
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TABLE L. — IRREGULAR FLUCTUATIONS* OF MONTHLY VALUES OF BUILDING PERMITS ISSUED 

FOR TWENTY LEADING CITIES 

(Frequency Table, by Months, July 1903-JuNE 1916) 



Irregular Fluctuations, 
Percentages 



Frequency 



Jan. 



Feb. 



March 



April 



May 



June 



July 



Aug. 



Sept. 



Oct. 



Nov. 



Dec. 



Total 



in 

u 



1-! 

ft 



+46.0 — 
+3S-Q • • • 
+32.3 ••• 
+21.9 ... 

+17-9 ••• 
+13.9 . . . 

+ 9-9 • • • 
+ 5-9 ••• 
+ 1.9 ... 

— 2.1 ... 

— 6.1 ... 
— 10.1 . . . 
-14.1 ... 
-18.1 ... 

— 22.1 . . . 

— 26.1 . . . 

Total 



1 
1 

2 
4 
3 

1 
1 



1 
1 
1 

4 

8 

10 

iS 

2 3 

2 3 

19 

19 

iS 

9 

4 

3 

1 



13 



13 



13 



13 



13 



13 



13 



13 



13 



13 



13 



13 



IS6 



* See text p. 137 for explanation of the method of securing irregular 
fluctuations. The seven irregular fluctuations greater than +17.9 
are for the following dates: August 1911 (+46.0), December 1912 
(+3S-o), December 1913 (+3 2 -3)> December 1915 (+21.7), July 1914 
(+20.3), February 1909 (+20.2), May 1907 (+18.2). 



The eight fluctuations less than —18.0 are for the following dates: 
November 1914 ( — 27.1), May 1911 (—25.0), February 1904 ( — 23.7), 
January 1908 (—23.2), March 1908 (—21.8), February 1908 (—21.7), 
April 1916 ( — 19.6), November 1907 (—18.5). 



twelve months moving averages to the corresponding 
ordinates of the secular trend as bases. This process 
(by averaging) presumably eliminates the irregular 
variations and seasonal variations; and (by taking 
ratios) the secular trend, but it does not eHminate the 
cycles. 

(3) The items found in (2) above were subtracted 
from the items described in (1). The irregular varia- 
tions were present in (1) but not in (2). The result- 
ing differences are, presumably, approximations to the 
irregular fluctuations. 

(4) The differences found above were tabulated in 
the form of a frequency table (Table L). From the 

1 Let xi, #2, Xz, . . . x„ be the actual or original series and 01, 02, o 3 , 
... . o„ be the corresponding ordinates of linear secular trend. Then, 
the secular trend is eliminated (by the process of dividing) in the 

series of ratios, — >—> — > ■ • • — . Further, let si, s 2 , • • • -to be 

01 02 03 O n 

the monthly indices of seasonal variation. Then, both the secular 
trend and seasonal movement (but not the cyclical movement) are 
eliminated in the series of ratios 



; etc. 



tt 



#1 Xi #12 _ #13 #14 #24 

0\S\ O2S2 012*12 013*1 014*2 024*12 

Still further, let trir, m$, . . . w»_e be the adjusted twelve months 
moving averages of the original items. Thus, 



«7 



= ;( : 



Xi + Xt +....+ #12 Xi + #3 + • 



+#1 



12 12 

Xi + 2#2 + 2# 3 + • • • + 2#n + 2#12 + #13 



24 



, and 



m n - 6 = 



#n-12 + 2#„_ii + 2#„_io + . . . + 2#„_2 + 2#„_i + #„ 
24 

Any given moving average, w,-, corresponds to an original item, Xi, 
and to an ordinate of secular trend, 0,-, with the same subscripts. In 



data in this table there was drawn a rectangular dia- 
gram. Finally, to this diagram there was fitted a nor- 
mal curve (Chart M). 1 

It is evident from Chart M that the distribution of 
the irregular fluctuations of the value of building per- 
mits is normal. The supposition with which we started 
is, therefore, confirmed. 

In the preceding sections of Part II we have described 
the methods which we have adopted of measuring the 
correlation between statistical series. We have also 
analyzed the nature of the concepts and arguments in- 
volved in those methods. In Parts III and IV we shall 

the moving averages, small irregular variations and seasonal varia- 
tions are " smoothed " out by the process of averaging; the secular 
trend and the cyclical movement (including the persistent irregular 
variations) are retained. If we take the ratios 

m m s ntn-e , . 

— ' — '•• ' (2) 

07 08 0n-6 

the secular trend will be eliminated by the process, leaving the wave- 
like or cyclical movements isolated. 

To summarize, in series (1) the secular trend and seasonal varia- 
tion are eliminated, leaving the cyclical and irregular fluctuations as a 
residuum. In series (2) the secular trend, seasonal variation, and 
irregular variations are eliminated, or nearly eliminated, leaving the 
cyclical fluctuations isolated. Consequently, the differences between 
corresponding items of series (1) and series (2), which are 

/ xi nh\ f x» ms\ I #n_ 6 _ Wn_ 6 

\ 07*7 07 / \ 08*8 08 / \ 0n-6 *6 <> n - 6 

constitute approximations to the irregular fluctuations: In other 
words, the irregular fluctuations are thus isolated. (We might, of 

course, have taken the ratios > > ■ • • instead of the differ- 

ni7S 7 mass 

ences and secured approximately the same results.) 
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give the results of applying the methods to numerous 
series of business statistics. Part III gives an outline 
of the method of correcting individual series for secular 
trend and seasonal variation, and the results of the 
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Chart M. — Rectangular diagram of the irregular fluctuations of 
monthly values of building permits issued for twenty leading cities 
(Table L) and the normal curve of distribution fitted to same, the 
irregular fluctuations being secured by correcting the original items 
for secular trend, cyclical fluctuations, and seasonal fluctuations. 



application of that method. 1 Part FV gives an outline 
of the method of comparing the corrected series and the 
results of the application of that method. A general 
statement of the results secured in Parts III and IV is 
given in Part I. 



III. APPLICATION OF THE METHOD TO THE 
DATA, (A) THE INDIVIDUAL SERIES 

1. Outline of the Method 

(a) The data necessary for the application of the 
method are homogeneous monthly series of statistics 
covering a period of, say, fifteen years or more. 

(b) The linear secular trend is found by fitting a 
straight line to annual items by the method of least 
squares. The, compound interest curve, parabola, or 
other curves are used in case a straight line does not 
give satisfactory results. 

(c) Indices of seasonal variation are found by taking 
the medians of month-to-month percentages for each 
of the twelve months. The medians, thus found, are 
progressively multiplied to form a continuous series, 
the discrepancy due to secular trend is distributed, and 
the average is made equal to ioo. 

(d) The original items are corrected for secular trend 

1 The results for Series 1 to 15 are given in the January Review, 
pp. 37-48. 



and seasonal variation, as follows: each monthly ordi- 
nate of secular trend is multiplied by the index of sea- 
sonal variation for that month; the resulting product 
is subtracted from the corresponding original item, and 
then expressed as a percentage of the ordinate of secu- 
lar trend. We thus secure " percentage deviations of 
the original items from the corresponding ordinates of 
secular trend corrected for seasonal variation." 

(e) The percentage deviations of the various series 
are expressed in terms of their respective standard 
deviations, in order to secure comparable cyclical 
fluctuations. 

2. Summary of Results for Each Series 

The source and nature of the series presented graphi- 
cally in Charts 16 to 18, and 20 to 24 are given in the 
following summary. Additional information is given 
for Series 4 and 9. The numbers of the series (in paren- 
thesis) are the same as those of the charts and frequency 
tables constructed from such series. Tables of the 
original series, the moving averages, the month-to- 
month link relatives, and the percentage deviations of 
the original items from the corresponding ordinates of 
secular trend corrected for seasonal variation are given 
in connection with Charts 16 to 18, and 20 to 24. The 
percentage deviations, just named, are presented 
graphically in Charts 116 to 118, 120, and 122 to 124, 
the units used being the respective standard devia- 
tions. The digits, in the units and tens places of the 
chart numbers, correspond to the number of the series 
upon which they are based. (For instance, Chart 16, 
Table 16, and Chart 116 all pertain to Series 16.) The 
equations of the lines of secular trend and the standard 
deviations for each series are given in the summary for 
that series. The indices of seasonal variation are given 
in the frequency tables of link relatives; graphs of the 
indices are presented as inserts of Charts 16 to 18, and 
20 to 24. 

4. Bradstreet's Monthly Index of 
Commodity Prices 

Source and Nature of Data. As stated in the 
Review of January 1919 (p. 41), this series has been 
transcribed from summaries given in Bradstreet's. The 
following description is adapted from " Index Numbers 
of Wholesale Prices in the United States and Foreign 
Countries," in Bulletin of the United States Bureau of 
Labor Statistics, No. 173, pp. 109-112, 141-148. 

The " index " is the aggregate of the per pound 
wholesale prices of selected staple commodities. The 
source of these quotations is not disclosed, but it is 
stated that they are from the primary markets. The 
index, expressed in dollars and cents, is published as of 
the first of each month in the department of Bradstreet's 
called " Measures of Movements." No revision of the 
method of constructing the index number appears to 
have been made since December 16, 1905. The exhibit 
as then published contained the index number by 
quarters from January 1, 1892 to October 1, 1898, and 



